Initially used to track and contain disease outbreaks in the 1940s (Metcalf et al., 1995), wastewater analysis has quickly gained traction as a way to monitor other public health measures such as the usage of illicit drugs (Zuccato et al., 2008). Wastewater based epidemiology (WBE) is a chemical analysis technique used to detect a myriad of compounds for regions serviced by wastewater treatment facilities. The functions of this technique range from monitoring environmental impact of household liquid waste to the very topical application of detecting the presence of Covid-19 (Glassmeyer et al., 2005), (Bentacourt et al., 2021).Following the consumption of illicit drugs, the body enzymatically breaks them down into by-products known as drug metabolites. As the process of metabolizing drugs is the same within all humans, the type and quantity of produced metabolites is already known (Concheiro et al., 2007). With this knowledge, scientists are able to estimate the quantity of drugs consumed (drug usage) within a community by measuring the metabolites within the local wastewater (Zuccato et al., 2008). Estimating the use of illicit drugs within a community is a crucial public health matter due to the known harms that regular consumption can cause (Bannwarth et al., 2019). This information can be used to help direct policies and distribute resources to communities in need. For more information on the ethical considerations of this study, please see Appendix A.
The following analysis is based off a wastewater analysis study which estimated the drug usage of amphetamines, cannabis, cocaine, 3,4-Methylenedioxymethamphetamine (MDMA), and methamphetamine within 137 cities across 29 European countries between 2011 and 2020 (EMCDDA, 2021). In our analysis, we sought to test whether drug usage varies between countries. Previous research suggests that drug usage is related to population-level age demographics; self-reported drug usage tends to be greatest in youth populations (Health Canada, 2010). Consistent with the definition of youth used by the European Monitoring Centre for Drugs and Drug Addiction (EMCDDA), we chose to define youth as those aged 15 to 24. Accordingly, we hypothesized that drug usage will differ between countries, and that differences in the relative proportion of youth (expressed as a percentage, PctYouth) would help explain these differences in drug usage between countries. Previous research also suggests that drug usage is related to population-level wealth demographics (Lipari and Park-Lee, 2019). We chose GDP as a measure of population wealth because it is widely referenced and compared in the literature (Nagelhout et al., 2017). Accordingly, we hypothesized that differences in Gross Domestic Product (GDP) would also explain differences in drug usage between countries. The objectives of this project are outlined in Section 1.2; corresponding hypotheses are summarized in Section 1.3.
This analysis is important because it helps contribute to our understanding of the social determinants of health. Our analysis will specifically help improve our understanding of the characteristics of populations that may correspond to increased drug usage.
Table 1.2. List of research objectives
| No. | Objective |
|---|---|
| 1. | To examine how drug usage differs between European reference countries. |
| 2. | To quantify the effect of the relative number of youth represented by the population of a country (PctYouth) on drug usage by reference country. |
| 3. | To quantify the effect of the relative wealth of a country (GDP) on drug usage by reference country. |
The effect of reference country on drug usage is dependent on the age (PctYouth) and wealth (GDP) characteristics of each reference country.
In an attempt to improve the reliability of the data, only reference countries that had drug usage estimates available for at least six reference cities in the original drug usage data set were considered in our analysis. Thus, the reference countries for our analysis include: Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland. The sample for the following analysis was constructed from the observations corresponding to the average weekday drug usage of 5 different types of reference drugs (amphetamines, canabis, cocaine, methamphetamine, and MDMA), measured across reference cities nested within six European reference countries, between the reference years 2011 and 2020. Average weekday usage was chosen because weekend usage data are unavailable for many of the observations in the original drug usage data set. Note that observations corresponding to a number of combinations of type of illicit reference drug, reference city, and reference year are also missing from the original data. For example, many measurements are missing for cannabis; thus, we chose to not include cannabis in our final analysis.
In the following, we use an observational study design to assess how drug usage differs across European reference countries included in our sample. The response variable of this study is drug usage (measured in mg/1000 people/day); the explanatory variable of this study is reference country (Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland). We are also interested in two cofactors: PctYouth, the relative number of youth represented by the population of a country (measured as a proportion), and GDP, a measure of the monetary value of economic activity (i.e. the production of goods and services) corresponding to a country (measured in 2010 USD). These cofactors are utilized to quantify the effect of population age and wealth demographics, respectively, on the relationship between drug usage, our explanatory variable, and reference country, the explanatory variable (Section 1.1). Thus, we construct linear models to address our research objectives (Section 6.0). Using these models, we also test for interaction effects between the two cofactors (PctYouth, GDP) used in this analysis.
Throughout this document, bold text represents vocabulary terms that have been given specific, operational definitions for the context of the following analysis. These terms and definitions are provided as a table below.
Table 1.6. Operational definitions.
| Term | Definition |
|---|---|
| Drug Usage | Reported for each reference city and reference drug in mg/1000 people/day (EMCDDA, 2021). For additional information, see Appendix B. |
| Gross Domestic Product (GDP) | The sum of gross value added by all resident producers in the economy, plus any product taxes and minus any subsidies not included in the value of products, reported for each country of interest in 2010 United States Dollars (USD). Calculated without making deductions for depreciation of fabricated assets for depletion and degradation of natural resources (World Bank Data, 2021). For more information on why this metric was chosen, see Section 3.3. |
| PctYouth | The number of individuals in the population that fall within the age range 15-24. Reported for each country of interest as a percentage (UN Department of Economic and Social Affairs, 2019). For more information on why this metric was chosen, see Section 3.2. |
| Reference city | The cities across the 6 European countries that drug usage was estimated in. For additional information, see Appendix B. |
| Reference country | The 6 European countries (Austria, Belgium, Finland, Germany, Slovenia, Spain, Sweden, and Switzerland) corresponding to drug usage estimates used in the present analysis. For ease of recognition, the names of these reference countries are presented in italics throughout this document. For more information on how these countries were selected, see Appendix B. |
| Reference drug | The 5 types of drugs (amphetamines, cannabis, cocaine, MDMA, methamphetamine) corresponding to drug usage estimates used in the present analysis. Throughout this document, the names of these illicit drugs are presented in italics for ease of recognition. For additional information on the types of illicit drugs considered in this analysis, see Appendix B. |
| Reference year | The 10 years (2011-2020) corresponding to drug usage estimates used in the present analysis. For additional information, see Appendix B. |
| Wastewater based epidemiology (WBE) | A chemical analysis technique used to detect compounds in the waters serviced by wastewater treatment facilities. For example, WBE is used in the present analysis to produce estimates of drug usage. For more information on WBE, see Appendix C. |
The following section provides a brief overview of the data acquisition for the drug usage, population age, and population wealth estimates used in our analysis (Section 2.1, Section 2.2, and Section 2.3, respectively). For more information, see the full project metadata in Appendix B.
Drug usage data was accessed through the European Monitoring Center for Drugs and Drug Addiction website (EMCDDA) (https://www.emcdda.europa.eu/publications/html/pods/waste-water-analysis_en#siteInfoTable). This data set looked at the usage of five common illicit drugs (amphetamines, cannabis, cocaine, methamphetamine, and MDMA), as measured in the wastewater of 137 cities across 29 European countries. The data were collected daily (in mg per 1000 people per day) over a span of 10 years (from 2011 to 2020). The data was averaged for each day of the week, the weekend, and the weekdays for each city over every year considered.
The subset of drug use data of interest is constructed from the mean weekday usage of each reference drug in each combination of reference city, nested within reference country, and reference year. For more information, including why weekday mean values were used and missing data in the original drug use data set, see Section 1.4.
Data on population age demographics were accessed from the United Nations Department of Economic and Social Affairs website (https://population.un.org/wpp/). Data were available from 235 countries and regions over a period of 60 years (from 1950-2020). These data show the proportion of the population accounted for by a number of pre-defined age groups (e.g. youth aged 15-24).
The subset of age data of interest was the proportion of the population aged 15-24 (PctYouth) corresponding to combinations of reference countries and reference years included in our study sample (Section 1.4).
Data on population wealth demographics were accessed via the World Bank website (https://data.worldbank.org/indicator/NY.GDP.MKTP.CD?fbclid=IwAR3tVOmvWo6ijRcavhXuUEhoDZgkVFE4DeiD1CtiZDYYiqpVk8sj0kSxxIY). This data set includes annual Gross Domestic Product (GDP; measured in the value of the 2010 US Dollar) for approximately 200 countries and regions between 1960 and 2020.
The subset of wealth data of interest was the GDP corresponding to combinations of reference countries and reference years included in our study sample (Section 1.4).
In this section, we set the R Markdown Document (RMD) up for analysis. Setting the document up for analysis involved two steps, loading the dependency libraries (Section 3.1), and importing data (Section 3.2).
The first step of setting up our RMD was to load the dependency libraries. Descriptions and references for each of these packages are available in Appendix D.
library(ggplot2)
library(dplyr)
library(ggfortify)
library(readr)
library(tidyr)
library(stringr)
library(lubridate)
library(plotrix)
library(lattice)The second step of setting up our RMD was to import the raw data. The raw data files used in this RMD are the raw .csv source data files (Appendix B). These files have also been made available as a part of a GitHub Repository for this project (https://github.com/trevor-lyman/European-drug-usage_2011-2020).
First, we imported the drug usage data.
# step 1: read .csv files
# 2011
amphetamine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-amphetamine-2011.csv")
cannabis_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-cannabis-2011.csv")
cocaine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-cocaine-2011.csv")
MDMA_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-MDMA-2011.csv")
methamphetamine_2011 <- read.csv("Data/Drug Metabolite Data/2011/WW-data-methamphetamine-2011.csv")
# 2012
amphetamine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-amphetamine-2012.csv")
cannabis_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-cannabis-2012.csv")
cocaine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-cocaine-2012.csv")
MDMA_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-MDMA-2012.csv")
methamphetamine_2012 <- read.csv("Data/Drug Metabolite Data/2012/WW-data-methamphetamine-2012.csv")
# 2013 -- no cannabis data available
amphetamine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-amphetamine-2013.csv")
cannabis_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-cannabis-2013.csv")
cocaine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-cocaine-2013.csv")
MDMA_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-MDMA-2013.csv")
methamphetamine_2013 <- read.csv("Data/Drug Metabolite Data/2013/WW-data-methamphetamine-2013.csv")
# 2014 -- no cannabis data available
amphetamine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-amphetamine-2014.csv")
cocaine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-cocaine-2014.csv")
MDMA_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-MDMA-2014.csv")
methamphetamine_2014 <- read.csv("Data/Drug Metabolite Data/2014/WW-data-methamphetamine-2014.csv")
# 2015 -- no cannabis data available
amphetamine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-amphetamine-2015.csv")
cocaine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-cocaine-2015.csv")
MDMA_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-MDMA-2015.csv")
methamphetamine_2015 <- read.csv("Data/Drug Metabolite Data/2015/WW-data-methamphetamine-2015.csv")
# 2016 -- no cannabis data available
amphetamine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-amphetamine-2016.csv")
cocaine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-cocaine-2016.csv")
MDMA_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-MDMA-2016.csv")
methamphetamine_2016 <- read.csv("Data/Drug Metabolite Data/2016/WW-data-methamphetamine-2016.csv")
# 2017 -- no cannabis data available
amphetamine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-amphetamine-2017.csv")
cocaine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-cocaine-2017.csv")
MDMA_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-MDMA-2017.csv")
methamphetamine_2017 <- read.csv("Data/Drug Metabolite Data/2017/WW-data-methamphetamine-2017.csv")
# 2018
amphetamine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-amphetamine-2018.csv")
cannabis_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-cannabis-2018.csv")
cocaine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-cocaine-2018.csv")
MDMA_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-MDMA-2018.csv")
methamphetamine_2018 <- read.csv("Data/Drug Metabolite Data/2018/WW-data-methamphetamine-2018.csv")
# 2019
amphetamine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-amphetamine-2019.csv")
cannabis_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-cannabis-2019.csv")
cocaine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-cocaine-2019.csv")
MDMA_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-MDMA-2019.csv")
methamphetamine_2019 <- read.csv("Data/Drug Metabolite Data/2019/WW-data-methamphetamine-2019.csv")
# 2020
amphetamine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-amphetamine-2020.csv")
cannabis_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-cannabis-2020.csv")
cocaine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-cocaine-2020.csv")
MDMA_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-MDMA-2020.csv")
methamphetamine_2020 <- read.csv("Data/Drug Metabolite Data/2020/WW-data-methamphetamine-2020.csv")
# step 2: check data
head(amphetamine_2011); str(amphetamine_2011) # check the data## SiteID country city Wednesday Thursday Friday Saturday Sunday Monday
## 1 site14 BE Antwerp Zuid 229.63 211.52 318.89 318.79 311.03 272.62
## 2 site17 BE Brussels 22.92 28.77 39.31 57.57 44.62 32.38
## 3 site34 CZ Budweis 20.20 27.07 24.60 29.43 23.50 37.31
## 4 site55 ES Barcelona 7.40 12.14 13.21 25.29 19.50 15.94
## 5 site56 ES Castellon 0.00 0.00 0.00 0.00 0.00 0.00
## 6 site59 ES Santiago 33.99 35.46 41.20 40.04 30.08 NA
## Tuesday Weekday.mean Weekend.mean Daily.mean
## 1 277.78 239.64 305.33 277.18
## 2 22.18 24.63 43.47 35.39
## 3 37.43 28.23 28.71 28.51
## 4 14.41 11.31 18.48 15.41
## 5 0.00 0.00 0.00 0.00
## 6 22.29 30.58 37.11 33.84
## 'data.frame': 14 obs. of 13 variables:
## $ SiteID : chr "site14" "site17" "site34" "site55" ...
## $ country : chr "BE" "BE" "CZ" "ES" ...
## $ city : chr "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
## $ Wednesday : num 229.6 22.9 20.2 7.4 0 ...
## $ Thursday : num 211.5 28.8 27.1 12.1 0 ...
## $ Friday : num 318.9 39.3 24.6 13.2 0 ...
## $ Saturday : num 318.8 57.6 29.4 25.3 0 ...
## $ Sunday : num 311 44.6 23.5 19.5 0 ...
## $ Monday : num 272.6 32.4 37.3 15.9 0 ...
## $ Tuesday : num 277.8 22.2 37.4 14.4 0 ...
## $ Weekday.mean: num 239.6 24.6 28.2 11.3 0 ...
## $ Weekend.mean: num 305.3 43.5 28.7 18.5 0 ...
## $ Daily.mean : num 277.2 35.4 28.5 15.4 0 ...
Next, we imported the population age (PctYouth) data.
raw.PctYouth <- read.csv("Data/Age Data/raw.age.csv", skip = 16) # read .csv file
raw.PctYouth <- raw.PctYouth %>%
select(Region..subregion..country.or.area..,
Reference.date..as.of.1.July.,
Population.aged.15.24)
# select 3 variables of interest
head(raw.PctYouth); str(raw.PctYouth) # check data## Region..subregion..country.or.area.. Reference.date..as.of.1.July.
## 1 WORLD 1950
## 2 WORLD 1951
## 3 WORLD 1952
## 4 WORLD 1953
## 5 WORLD 1954
## 6 WORLD 1955
## Population.aged.15.24
## 1 18.2
## 2 18.1
## 3 18.0
## 4 17.8
## 5 17.7
## 6 17.6
## 'data.frame': 18105 obs. of 3 variables:
## $ Region..subregion..country.or.area..: chr "WORLD" "WORLD" "WORLD" "WORLD" ...
## $ Reference.date..as.of.1.July. : int 1950 1951 1952 1953 1954 1955 1956 1957 1958 1959 ...
## $ Population.aged.15.24 : chr "18.2" "18.1" "18.0" "17.8" ...
Finally, we imported the population wealth (GDP) data.
raw.gdp <- as.data.frame(na.omit(pivot_longer(data = read.csv("Data/GDP Data/raw.gdp.csv",
skip = 4),
cols = 5:65, names_to = "year", values_to = "GDP")))
#read .csv file; use pivot_longer() to transform to long format
head(raw.gdp); str(raw.gdp) #check data## Country.Name Country.Code Indicator.Name Indicator.Code year GDP
## 1 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1986 405463417
## 2 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1987 487602458
## 3 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1988 596423607
## 4 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1989 695304363
## 5 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1990 764887117
## 6 Aruba ABW GDP (current US$) NY.GDP.MKTP.CD X1991 872138715
## 'data.frame': 12762 obs. of 6 variables:
## $ Country.Name : chr "Aruba" "Aruba" "Aruba" "Aruba" ...
## $ Country.Code : chr "ABW" "ABW" "ABW" "ABW" ...
## $ Indicator.Name: chr "GDP (current US$)" "GDP (current US$)" "GDP (current US$)" "GDP (current US$)" ...
## $ Indicator.Code: chr "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" "NY.GDP.MKTP.CD" ...
## $ year : chr "X1986" "X1987" "X1988" "X1989" ...
## $ GDP : num 4.05e+08 4.88e+08 5.96e+08 6.95e+08 7.65e+08 ...
## - attr(*, "na.action")= 'omit' Named int [1:3464] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr [1:3464] "1" "2" "3" "4" ...
After our data was imported into the R environment, our next step was to restructure the data to prepare for analysis. This is often referred to as ‘tidying’ the data (Beckerman et al., 2017). We do this individually for the drug usage, PctYouth, and GDP data in subsections Section 4.1, Section 4.2, and Section 4.3, respectively. The goal of this section was to build a long-wise table that presented the reference country, reference city, reference year, reference drug, and average weekday drug use for each observation in the sample. This final step is shown in subsection Section 4.5.
First, we tidied the drug usage data; this is shown below in 3 nested steps. These data were imported as individual .csv files for each type of drug and reference year in Section 3.2; we wanted to merge these observations into a single dataframe that contained observations of the average weekday drug usage corresponding to each type of drug from each combination of reference country and reference year in the sample.
The first step of tidying the drug usage data is shown below. We began by tidying each of the data corresponding to observations of the use of a reference drug from a single reference year. We repeated this step for each reference drug for a single reference year.
#step 1: 2011 amphetamine
amphetamine_2011.tidy <- amphetamine_2011 %>% #create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(amphetamine_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("amphetamine", length(amphetamine_2011$country))))
# add variable for type of drug usage
amphetamine_2011.tidy$country <- as.factor(amphetamine_2011.tidy$country)
# make country a factor
levels(amphetamine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic", ES = "Spain",
FR = "France", B = "Great Britain", HR = "Hungary",
IT = "Italy", NL = "Netherlands", NO = "Norway",
SE = "Sweden")
# change country codes to names of countries
head(amphetamine_2011.tidy); str(amphetamine_2011.tidy) # check the data## country city Weekday.mean year drug
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine
## 2 Belgium Brussels 24.63 2011 amphetamine
## 3 Czech Republic Budweis 28.23 2011 amphetamine
## 4 Spain Barcelona 11.31 2011 amphetamine
## 5 Spain Castellon 0.00 2011 amphetamine
## 6 Spain Santiago 30.58 2011 amphetamine
## 'data.frame': 14 obs. of 5 variables:
## $ country : Factor w/ 10 levels "Belgium","Czech Republic",..: 1 1 2 3 3 3 4 5 6 7 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
## $ Weekday.mean: num 239.6 24.6 28.2 11.3 0 ...
## $ year : num 2011 2011 2011 2011 2011 ...
## $ drug : chr "amphetamine" "amphetamine" "amphetamine" "amphetamine" ...
The next step was to build an annual drug usage summary, as shown below. This is a long-wise table showing the types and amount of drug usage observed in each reference city in the given reference year.
# 2011 cannabis
cannabis_2011.tidy <- cannabis_2011 %>% # create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(cannabis_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("cannabis", length(cannabis_2011$country))))
# add variable for type of drug usage
cannabis_2011.tidy$country <- as.factor(cannabis_2011.tidy$country)
# make country a factor
levels(cannabis_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic",
ES = "Spain",
FR = "France", HR = "Hungary", IT = "Italy",
NL = "Netherlands", SE = "Sweden")
# change country codes to names of countries
# 2011 cocaine
cocaine_2011.tidy <- cocaine_2011 %>% # create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(cocaine_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("cocaine", length(cocaine_2011$country))))
# add variable for type of drug usage
cocaine_2011.tidy$country <- as.factor(cocaine_2011.tidy$country)
# make country a factor
levels(cocaine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic",
ES = "Spain",
FR = "France", GB = "Great Britain",
HR = "Hungary",
IT = "Italy", NL = "Netherlands",
NO = "Norway", SE = "Sweden")
# change country codes to names of countries
# 2011 MDMA
MDMA_2011.tidy <- MDMA_2011 %>% # create data frame
select(c(country, city, Weekday.mean)) %>%
# subset country, city, weekday mean concentration
mutate(year = c(rep(2011, length(MDMA_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("MDMA", length(MDMA_2011$country))))
# add variable for type of drug metabolie
MDMA_2011.tidy$country <- as.factor(MDMA_2011.tidy$country)
# make country a factor
levels(MDMA_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic",
ES = "Spain", FR = "France",
GB = "Great Britain", HR = "Hungary",
IT = "Italy",
NL = "Netherlands", NO = "Norway", SE = "Sweden")
# change country codes to names of countries
methamphetamine_2011.tidy.temp <- methamphetamine_2011 %>% # create data frame
select(c(country, city, methamphetamineWDMean2011)) %>%
# note that weekday mean concentration has a different name in this file than all others
mutate(Weekday.mean = methamphetamine_2011$methamphetamineWDMean2011) %>%
# fix name of weekday mean concentration to be consistent with others
mutate(year = c(rep(2011, length(methamphetamine_2011$country)))) %>%
# add variable for reference year
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2011$country))))
# add variable for type of drug usage
methamphetamine_2011.tidy <- select(methamphetamine_2011.tidy.temp, -c(methamphetamineWDMean2011))
# remove weekday mean variable with incorrect name
methamphetamine_2011.tidy$country <- as.factor(methamphetamine_2011.tidy$country)
# make country a factor
levels(methamphetamine_2011.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic",
ES = "Spain",
FR = "France", GB = "Great Britain",
HR = "Hungary",
IT = "Italy", NL = "Netherlands",
NO = "Norway",
SE = "Sweden")
# change country codes to names of countries
drug_2011 <- rbind(amphetamine_2011.tidy, cannabis_2011.tidy, cocaine_2011.tidy, MDMA_2011.tidy,
methamphetamine_2011.tidy)
# use rbind() to merge all drug usages in a single dataframe
head(drug_2011); str(drug_2011) #check the data## country city Weekday.mean year drug
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine
## 2 Belgium Brussels 24.63 2011 amphetamine
## 3 Czech Republic Budweis 28.23 2011 amphetamine
## 4 Spain Barcelona 11.31 2011 amphetamine
## 5 Spain Castellon 0.00 2011 amphetamine
## 6 Spain Santiago 30.58 2011 amphetamine
## 'data.frame': 71 obs. of 5 variables:
## $ country : Factor w/ 10 levels "Belgium","Czech Republic",..: 1 1 2 3 3 3 4 5 6 7 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Budweis" "Barcelona" ...
## $ Weekday.mean: num 239.6 24.6 28.2 11.3 0 ...
## $ year : num 2011 2011 2011 2011 2011 ...
## $ drug : chr "amphetamine" "amphetamine" "amphetamine" "amphetamine" ...
The final step of tidying the drug usage data was to repeat the above steps for each reference year in the sample, and to merge the resulting 10 annual drug usage summaries into a single long-wise table adding the variable reference year. Again, this step is shown below.
# 2012
# amphetamine
amphetamine_2012.tidy <- amphetamine_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(amphetamine_2012$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2012$country))))
amphetamine_2012.tidy$country <- as.factor(amphetamine_2012.tidy$country)
levels(amphetamine_2012$country) <- c(BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain",
HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", SE = "Sweden")
# cannabis
cannabis_2012.tidy <- cannabis_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(cannabis_2012$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2012$country))))
cannabis_2012.tidy$country <- as.factor(cannabis_2012.tidy$country)
levels(cannabis_2012.tidy$country) <- c(BE = "Belgium", CZ = "Czech Republic",
ES = "Spain",
FR = "France", HR = "Hungary", IT = "Italy",
NL = "Netherlands", NO = "Norway",
SE = "Sweden")
# cocaine
cocaine_2012.tidy <- cocaine_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(cocaine_2012$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2012$country))))
cocaine_2012.tidy$country <- as.factor(cocaine_2012.tidy$country)
levels(cocaine_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic",
ES = "Spain", FI = "Finland", FR = "France",
HR = "Hungary",
IT = "Italy", NL = "Netherlands",
NO = "Norway", SE = "Sweden")
# MDMA
MDMA_2012.tidy <- MDMA_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(MDMA_2012$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2012$country))))
MDMA_2012.tidy$country <- as.factor(MDMA_2012.tidy$country)
levels(MDMA_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic",
ES = "Spain", FI = "Finland", HR = "Hungary",
IT = "Italy",
NL = "Netherlands", NO = "Norway", SE = "Sweden")
# methamphetamine
methamphetamine_2012.tidy <- methamphetamine_2012 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2012, length(methamphetamine_2012$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2012$country))))
methamphetamine_2012.tidy$country <- as.factor(methamphetamine_2012.tidy$country)
levels(methamphetamine_2012.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
CZ = "Czech Republic", ES = "Spain",
FR = "France",
HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", SE = "Sweden")
# 2012 annual summary
drug_2012 <- rbind(amphetamine_2012.tidy, cannabis_2012.tidy, cocaine_2012.tidy, MDMA_2012.tidy,
methamphetamine_2012.tidy)
# 2013
# amphetamine
amphetamine_2013.tidy <- amphetamine_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(amphetamine_2013$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2013$country))))
amphetamine_2013.tidy$city <- as.factor(amphetamine_2013.tidy$city)
amphetamine_2013.tidy$country <- as.factor(amphetamine_2013.tidy$country)
levels(amphetamine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece",
HR = "Hungary", NL = "Netherlands",
NO = "Norway",
PT = "Portugal", RO = "Romania",
RS = "Serbia",
SE = "Sweden", SK = "Slovakia")
# cannabis
cannabis_2013.tidy <- cannabis_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(cannabis_2013$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2013$country))))
cannabis_2013.tidy$city <- as.factor(cannabis_2013.tidy$city)
cannabis_2013.tidy$country <- as.factor(cannabis_2013.tidy$country)
levels(cannabis_2013.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
DE = "Germany",
DK = "Denmark", ES = "Spain", FR = "France",
GR = "Greece",
HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# cocaine
cocaine_2013.tidy <- cocaine_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(cocaine_2013$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2013$country))))
cocaine_2013.tidy$city <- as.factor(cocaine_2013.tidy$city)
cocaine_2013.tidy$country <- as.factor(cocaine_2013.tidy$country)
levels(cocaine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark",
ES = "Spain", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary", IT = "Italy",
NL = "Netherlands", NO = "Norway",
PT = "Portugal", RO = "Romania",
RS = "Serbia", SE = "Sweden",
SK = "Slovakia")
# MDMA
MDMA_2013.tidy <- MDMA_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(MDMA_2013$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2013$country))))
MDMA_2013.tidy$city <- as.factor(MDMA_2013.tidy$city)
MDMA_2013.tidy$country <- as.factor(MDMA_2013.tidy$country)
levels(MDMA_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium", CH = "Switzerland",
CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain",
GR = "Greece", HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", PT = "Portugal", RO = "Romania",
RS = "Serbia",
SE = "Sweden", SK = "Slovakia")
# methamphetamine
methamphetamine_2013.tidy <- methamphetamine_2013 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2013, length(methamphetamine_2013$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2013$country))))
methamphetamine_2013.tidy$city <- as.factor(methamphetamine_2013.tidy$city)
methamphetamine_2013.tidy$country <- as.factor(methamphetamine_2013.tidy$country)
levels(methamphetamine_2013.tidy$country) <- c(BA = "Bosnia", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FR = "France",
GB = "Great Britain", GF = "Greece",
IT = "Italy",
NL = "Netherlands", NO = "Norway",
PT = "Portugal",
RO = "Romania", RS = "Serbia",
SE = "Sweden",
SK = "Slovakia")
# 2013 annual summary
drug_2013 <- rbind(amphetamine_2013.tidy, cannabis_2013.tidy, cocaine_2013.tidy, MDMA_2013.tidy,
methamphetamine_2013.tidy)
# 2014
# amphetamine
amphetamine_2014.tidy <- amphetamine_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(amphetamine_2014$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2014$country))))
amphetamine_2014.tidy$country <- as.factor(amphetamine_2014.tidy$country)
levels(amphetamine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IT = "Italy", NL = "Netherlands",
PT = "Portugal")
# cocaine
cocaine_2014.tidy <- cocaine_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(cocaine_2014$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2014$country))))
cocaine_2014.tidy$country <- as.factor(cocaine_2014.tidy$country)
levels(cocaine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece",
HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# MDMA
MDMA_2014.tidy <- MDMA_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(MDMA_2014$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2014$country))))
MDMA_2014.tidy$city <- as.factor(MDMA_2014.tidy$city)
MDMA_2014.tidy$country <- as.factor(MDMA_2014.tidy$country)
levels(MDMA_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
DE = "Germany", DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain",
GR = "Greece", HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# methamphetamine
methamphetamine_2014.tidy <- methamphetamine_2014 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2014, length(methamphetamine_2014$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2014$country))))
methamphetamine_2014.tidy$city <- as.factor(methamphetamine_2014.tidy$city)
methamphetamine_2014.tidy$country <- as.factor(methamphetamine_2014.tidy$country)
levels(methamphetamine_2014.tidy$country) <- c(BE = "Belgium", CH = "Switzerland",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece",
HR = "Hungary", IT = "Italy",
NL = "Netherlands",
NO = "Norway", PT = "Portugal")
# 2014 annual summary
drug_2014 <- rbind(amphetamine_2014.tidy, cocaine_2014.tidy, MDMA_2014.tidy, methamphetamine_2014.tidy)
# 2015
# amphetamine
amphetamine_2015.tidy <- amphetamine_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(amphetamine_2015$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2015$country))))
amphetamine_2015.tidy$country <- as.factor(amphetamine_2015.tidy$country)
levels(amphetamine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece",
HR = "Hungary", IS = "Iceland",
IT = "Italy",
MT = "Malta", NL = "Netherlands",
NO = "Norway",
PT = "Portugal", SK = "Slovakia")
# cocaine
cocaine_2015.tidy <- cocaine_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(cocaine_2015$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2015$country))))
cocaine_2015.tidy$country <- as.factor(cocaine_2015.tidy$country)
levels(cocaine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy", MT = "Malta",
NL = "Netherlands",
NO = "Norway", PT = "Portugal",
SK = "Slovakia")
# MDMA
MDMA_2015.tidy <- MDMA_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(MDMA_2015$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2015$country))))
MDMA_2015.tidy$country <- as.factor(MDMA_2015.tidy$country)
levels(MDMA_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary", IT = "Italy",
NL = "Netherlands", NO = "Norway",
PT = "Portugal", SK = "Slovakia")
# methamphetamine
methamphetamine_2015.tidy <- methamphetamine_2015 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2015, length(methamphetamine_2015$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2015$country))))
methamphetamine_2015.tidy$country <- as.factor(methamphetamine_2015.tidy$country)
levels(methamphetamine_2015.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece",
HR = "Hungary", IS = "Iceland",
IT = "Italy",
MT = "Malta", NL = "Netherlands",
NO = "Norway",
PT = "Portugal", SK = "Slovakia")
# 2015 annual summary
drug_2015 <- rbind(amphetamine_2015.tidy, cocaine_2015.tidy, MDMA_2015.tidy, methamphetamine_2015.tidy)
# 2016
# amphetamine
amphetamine_2016.tidy <- amphetamine_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(amphetamine_2016$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2016$country))))
amphetamine_2016.tidy$country <- as.factor(amphetamine_2016.tidy$country)
levels(amphetamine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy",
NE = "Netherlands",
NL = "Netherlands", NO = "Norway",
PT = "Portugal",
SE = "Sweden", SK = "Slovakia")
# cocaine
cocaine_2016.tidy <- cocaine_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(cocaine_2016$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2016$country))))
cocaine_2016.tidy$country <- as.factor(cocaine_2016.tidy$country)
levels(cocaine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy",
NL = "Netherlands", NO = "Norway",
PT = "Portugal", SE = "Sweden",
SK = "Slovakia")
# MDMA
MDMA_2016.tidy <- MDMA_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(MDMA_2016$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2016$country))))
MDMA_2016.tidy$country <- as.factor(MDMA_2016.tidy$country)
levels(MDMA_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", NL = "Netherlands",
NO = "Norway",
PT = "Portugal", SE = "Sweden", SK = "Slovakia")
# methamphetamine
methamphetamine_2016.tidy <- methamphetamine_2016 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2016, length(methamphetamine_2016$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2016$country))))
methamphetamine_2016.tidy$country <- as.factor(methamphetamine_2016.tidy$country)
levels(methamphetamine_2016.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy",
NL = "Netherlands",
NO = "Norway", PT = "Portugal",
SE = "Sweden", SK = "Slovakia")
# 2016 annual summary
drug_2016 <- rbind(amphetamine_2016.tidy, cocaine_2016.tidy, MDMA_2016.tidy, methamphetamine_2016.tidy)
# 2017
# amphetamine
amphetamine_2017.tidy <- amphetamine_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(amphetamine_2017$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2017$country))))
amphetamine_2017.tidy$country <- as.factor(amphetamine_2017.tidy$country)
levels(amphetamine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy",
LT = "Lithuania",
NL = "Netherlands", NO = "Norway",
PL = "Poland",
PT = "Portugal", SI = "Slovenia",
SK = "Slovakia")
# cocaine
cocaine_2017.tidy <- cocaine_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(cocaine_2017$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2017$country))))
cocaine_2017.tidy$country <- as.factor(cocaine_2017.tidy$country)
levels(cocaine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CZ = "Czech Republic", DE = "Germany",
ES = "Spain",
FI = "Finland", FR = "France",
GB = "Great Britain",
GR = "Greece", HR = "Hungary", IS = "Iceland",
IT = "Italy",
LT = "Lithuania", NL = "Netherlands",
NO = "Norway",
PT = "Portugal", SI = "Slovenia",
SK = "Slovakia")
# MDMA
MDMA_2017.tidy <- MDMA_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(MDMA_2017$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2017$country))))
MDMA_2017.tidy$country <- as.factor(MDMA_2017.tidy$country)
levels(MDMA_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France", GB = "Great Britain",
GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania",
NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal",
SI = "Slovenia", SK = "Slovakia")
# methamphetamine
methamphetamine_2017.tidy <- methamphetamine_2017 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2017, length(methamphetamine_2017$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2017$country))))
methamphetamine_2017.tidy$country <- as.factor(methamphetamine_2017.tidy$country)
levels(methamphetamine_2017.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IT = "Italy", LT = "Lithuania",
NL = "Netherlands",
NO = "Norway", PL = "Poland",
PT = "Portugal",
SI = "Slovenia", SK = "Slovakia")
# 2017 annual summary
drug_2017 <- rbind(amphetamine_2017.tidy, cocaine_2017.tidy, MDMA_2017.tidy, methamphetamine_2017.tidy)
# 2018
# amphetamine
amphetamine_2018.tidy <- amphetamine_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(amphetamine_2018$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2018$country))))
amphetamine_2018.tidy$country <- as.factor(amphetamine_2018.tidy$country)
levels(amphetamine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
GR = "Greece", HR = "Hungary",
IS = "Iceland", IT = "Italy",
LT = "Lithuania", NL = "Netherlands",
NO = "Norway",
PT = "Portugal", SI = "Slovenia",
SK = "Slovakia",
TR = "Turkey")
# cannabis
cannabis_2018.tidy <- cannabis_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(cannabis_2018$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2018$country))))
cannabis_2018.tidy$country <- as.factor(cannabis_2018.tidy$country)
levels(cannabis_2018.tidy$country) <- c(AT = "Austria", CZ = "Czech Republic",
ES = "Spain",
FR = "France", GR = "Greece", HR = "Hungary",
IS = "Iceland",
IT = "Italy", NL = "Netherlands",
PT = "Portugal",
SI = "Slovenia", SK = "Slovakia")
# cocaine
cocaine_2018.tidy <- cocaine_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(cocaine_2018$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2018$country))))
cocaine_2018.tidy$country <- as.factor(cocaine_2018.tidy$country)
levels(cocaine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", HR = "Hungary",
IS = "Iceland",
IT = "Italy", LT = "Lithuania",
NL = "Netherlands",
NO = "Norway", PT = "Portugal",
SI = "Slovenia",
SK = "Slovakia")
# MDMA
MDMA_2018.tidy <- MDMA_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(MDMA_2018$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2018$country))))
MDMA_2018.tidy$country <- as.factor(MDMA_2018.tidy$country)
levels(MDMA_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy", LT = "Lithuania",
NL = "Netherlands",
NO = "Norway", PT = "Portugal", SI = "Slovenia",
SK = "Slovakia")
# methamphetamine
methamphetamine_2018.tidy <- methamphetamine_2018 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2018, length(methamphetamine_2018$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2018$country))))
methamphetamine_2018.tidy$country <- as.factor(methamphetamine_2018.tidy$country)
levels(methamphetamine_2018.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark",
ES = "Spain", FI = "Finland",
FR = "France",
GB = "Great Britain", HR = "Hungary",
IS = "Iceland",
IT = "Italy", LT = "Lithuania",
NL = "Netherlands",
NO = "Norway", PT = "Portugal",
SI = "Slovenia",
SK = "Slovakia", TR = "Turkey")
# 2018 annual summary
drug_2018 <- rbind(amphetamine_2018.tidy, cannabis_2018.tidy, cocaine_2018.tidy, MDMA_2018.tidy,
methamphetamine_2018.tidy)
# 2019
# amphetamine
amphetamine_2019.tidy <- amphetamine_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(amphetamine_2019$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2019$country))))
amphetamine_2019.tidy$country <- as.factor(amphetamine_2019.tidy$country)
levels(amphetamine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
HR = "Hungary", IS = "Iceland",
IT = "Italy",
LT = "Lithuania", LV = "Latvia",
NL = "Netherlands",
NO = "Norway", PL = "Poland",
PT = "Portugal", SE = "Sweden",
SI = "Slovenia", TR = "Turkey")
# cannabis
cannabis_2019.tidy <- cannabis_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(cannabis_2019$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2019$country))))
cannabis_2019.tidy$country <- as.factor(cannabis_2019.tidy$country)
levels(cannabis_2019.tidy$country) <- c(AT = "Austria", CH = "Switzerland",
CZ = "Czech Republic", ES = "Spain",
FR = "France", GR = "Greece", HR = "Hungary",
IS = "Iceland",
IT = "Italy", NL = "Netherlands",
PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia",
SK = "Slovakia", TR = "Turkey")
# cocaine
cocaine_2019.tidy <- cocaine_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(cocaine_2019$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2019$country))))
cocaine_2019.tidy$country <- as.factor(cocaine_2019.tidy$country)
levels(cocaine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland",
IT = "Italy", LT = "Lithuania",
LV = "Latvia", NL = "Netherlands",
NO = "Norway", PL = "Poland",
PT = "Portugal", SE = "Sweden",
SI = "Slovenia", SK = "Slovakia",
TR = "Turkey")
# MDMA
MDMA_2019.tidy <- MDMA_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(MDMA_2019$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2019$country))))
MDMA_2019.tidy$country <- as.factor(MDMA_2019.tidy$country)
levels(MDMA_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland", IT = "Italy",
LT = "Lithuania",
LV = "Latvia", NL = "Netherlands", NO = "Norway",
PL = "Poland",
PT = "Portugal", SE = "Sweden", SI = "Slovenia",
TR = "Turkey")
# methamphetamine
methamphetamine_2019.tidy <- methamphetamine_2019 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2019, length(methamphetamine_2019$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2019$country))))
methamphetamine_2019.tidy$country <- as.factor(methamphetamine_2019.tidy$country)
levels(methamphetamine_2019.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy",
LT = "Lithuania", LV = "Latvia",
NL = "Netherlands", NO = "Norway",
PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia",
TR = "Turkey")
# 2019 annual summary
drug_2019 <- rbind(amphetamine_2019.tidy, cannabis_2019.tidy, cocaine_2019.tidy, MDMA_2019.tidy,
methamphetamine_2019.tidy)
# 2020
# amphetamine
amphetamine_2020.tidy <- amphetamine_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(amphetamine_2020$country)))) %>%
mutate(drug = c(rep("amphetamine", length(amphetamine_2020$country))))
amphetamine_2020.tidy$country <- as.factor(amphetamine_2020.tidy$country)
levels(amphetamine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
HR = "Hungary", IS = "Iceland",
IT = "Italy",
LT = "Lithuania", LV = "Latvia",
NL = "Netherlands",
NO = "Norway", PL = "Poland",
PT = "Portugal", SE = "Sweden",
SI = "Slovenia", TR = "Turkey")
# cannabis
cannabis_2020.tidy <- cannabis_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(cannabis_2020$country)))) %>%
mutate(drug = c(rep("cannabis", length(cannabis_2020$country))))
cannabis_2020.tidy$country <- as.factor(cannabis_2020.tidy$country)
levels(cannabis_2020.tidy$country) <- c(AT = "Austria", CH = "Switzerland",
CZ = "Czech Republic", ES = "Spain",
FR = "France", GR = "Greece", HR = "Hungary",
IS = "Iceland",
IT = "Italy", NL = "Netherlands",
PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia",
SK = "Slovakia", TR = "Turkey")
# cocaine
cocaine_2020.tidy <- cocaine_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(cocaine_2020$country)))) %>%
mutate(drug = c(rep("cocaine", length(cocaine_2020$country))))
cocaine_2020.tidy$country <- as.factor(cocaine_2020.tidy$country)
levels(cocaine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark",
ES = "Spain", FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland",
IT = "Italy", LT = "Lithuania", LV = "Latvia",
NL = "Netherlands",
NO = "Norway", PL = "Poland", PT = "Portugal",
SE = "Sweden",
SI = "Slovenia", SK = "Slovakia",
TR = "Turkey")
# MDMA
MDMA_2020.tidy <- MDMA_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(MDMA_2020$country)))) %>%
mutate(drug = c(rep("MDMA", length(MDMA_2020$country))))
MDMA_2020.tidy$country <- as.factor(MDMA_2020.tidy$country)
levels(MDMA_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland", CY = "Cyprus",
CZ = "Czech Republic", DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary", IS = "Iceland", IT = "Italy",
LT = "Lithuania",
LV = "Latvia", NL = "Netherlands", NO = "Norway",
PL = "Poland",
PT = "Portugal", SE = "Sweden", SI = "Slovenia",
TR = "Turkey")
# methamphetamine
methamphetamine_2020.tidy <- methamphetamine_2020 %>%
select(country, city, Weekday.mean) %>%
mutate(year = c(rep(2020, length(methamphetamine_2020$country)))) %>%
mutate(drug = c(rep("methamphetamine", length(methamphetamine_2020$country))))
methamphetamine_2020.tidy$country <- as.factor(methamphetamine_2020.tidy$country)
levels(methamphetamine_2020.tidy$country) <- c(AT = "Austria", BE = "Belgium",
CH = "Switzerland",
CY = "Cyprus", CZ = "Czech Republic",
DE = "Germany",
DK = "Denmark", ES = "Spain",
FI = "Finland", FR = "France",
GB = "Great Britain", GR = "Greece",
HR = "Hungary",
IS = "Iceland", IT = "Italy",
LT = "Lithuania", LV = "Latvia",
NL = "Netherlands", NO = "Norway",
PL = "Poland", PT = "Portugal",
SE = "Sweden", SI = "Slovenia",
TR = "Turkey")
# 2020 annual summary
drug_2020 <- rbind(amphetamine_2020.tidy, cannabis_2020.tidy, cocaine_2020.tidy, MDMA_2020.tidy,
methamphetamine_2020.tidy)
# total summary
drug <- rbind(drug_2011, drug_2012, drug_2013, drug_2014, drug_2015, drug_2016, drug_2017, drug_2018,
drug_2019, drug_2020)
# merge with rbind()
drug.tidy <- drug[drug$country == "Austria" | drug$country == "Belgium" |
drug$country == "Finland" |
drug$country == "Germany" | drug$country == "Sweden" |
drug$country == "Switzerland" |
drug$country == "Spain" | drug$country == "Slovenia",]
# subset 8 countries of interest
drug.tidy$year <- as.integer(drug.tidy$year) # transform year as integer
drug.tidy$drug <- as.factor(drug.tidy$drug) # transform drug as factor
head(drug.tidy); str(drug.tidy) # check the data## country city Weekday.mean year drug
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine
## 2 Belgium Brussels 24.63 2011 amphetamine
## 4 Spain Barcelona 11.31 2011 amphetamine
## 5 Spain Castellon 0.00 2011 amphetamine
## 6 Spain Santiago 30.58 2011 amphetamine
## 14 Sweden Umea 16.57 2011 amphetamine
## 'data.frame': 1401 obs. of 5 variables:
## $ country : Factor w/ 40 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
## $ Weekday.mean: num 239.6 24.6 11.3 0 30.6 ...
## $ year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ drug : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
Next, we tidied the population age (PctYouth) data. These data were already formatted to include the variables we were interested in: reference country, reference date, and PctYouth. Thus, we tidied the PctYouth data in three simple steps. First, we renamed the variables; next we reformatted the data types; and finally we removed the observations outside the scope of our study.
# step 1: rename variables
raw.PctYouth.tidy <- raw.PctYouth # create dataframe
colnames(raw.PctYouth.tidy) <- c("country", "year", "PctYouth") # fix col names
# age = percentage of population between the ages of 15 and 24
# step 2: reformat data
raw.PctYouth.tidy$PctYouth <- as.numeric(raw.PctYouth.tidy$PctYouth) # make PctYouth numeric
# NAs produced correspond to reference countries that we will remove (outside of scope of study)
raw.PctYouth.tidy$year <- as.integer(raw.PctYouth.tidy$year) # make year an integer
raw.PctYouth.tidy$country <- as.factor(raw.PctYouth.tidy$country)
# step 3: remove observations
PctYouth <- raw.PctYouth.tidy[raw.PctYouth.tidy$country == "Austria" |
raw.PctYouth.tidy$country == "Belgium" |
raw.PctYouth.tidy$country == "Finland" |
raw.PctYouth.tidy$country == "Germany" |
raw.PctYouth.tidy$country == "Sweden" |
raw.PctYouth.tidy$country == "Switzerland" |
raw.PctYouth.tidy$country == "Spain" |
raw.PctYouth.tidy$country == "Slovenia",]
# subset only the 8 reference countries we're interested in
PctYouth.tidy <- PctYouth[PctYouth$year == 2011 | PctYouth$year == 2012 |
PctYouth$year == 2013 | PctYouth$year == 2014 |
PctYouth$year == 2015 | PctYouth$year == 2016 |
PctYouth$year == 2017 | PctYouth$year == 2018 |
PctYouth$year == 2019 | PctYouth$year == 2020, ]
# subset only the reference dates we're interested in
# check the (tidied) PctYouth data
head(PctYouth.tidy); str(PctYouth.tidy) ## country year PctYouth
## 15895 Finland 2011 12.3
## 15896 Finland 2012 12.2
## 15897 Finland 2013 12.1
## 15898 Finland 2014 12.0
## 15899 Finland 2015 11.9
## 15900 Finland 2016 11.7
## 'data.frame': 80 obs. of 3 variables:
## $ country : Factor w/ 255 levels "Afghanistan",..: 79 79 79 79 79 79 79 79 79 79 ...
## $ year : int 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 ...
## $ PctYouth: num 12.3 12.2 12.1 12 11.9 11.7 11.5 11.3 11.1 11 ...
Then, we tidied the population wealth (GDP) data. We imported these data as a long-wise table in Section 3.2. Thus, we tidied the GDP data in three steps. First, we renamed the GDP columns and created a subset of our variables of interest. Next, we removed the observations that were outside the scope of our study. Finally, we reformatted the variable types.
# step 1: rename cols and subset vars of interest
raw.gdp.temp <- na.omit(raw.gdp) #create dataframe
colnames(raw.gdp.temp) <- c("country", "country.code", "indicator.name",
"indicator.code", "year", "GDP")
# update column names
raw.gdp.tidy <- raw.gdp.temp %>% select(country, year, GDP)
# subset country, year, GDP
# step 2: remove observations
gdp <- raw.gdp.tidy[raw.gdp.tidy$country == "Austria" |
raw.gdp.tidy$country == "Belgium" |
raw.gdp.tidy$country == "Finland" |
raw.gdp.tidy$country == "Germany" |
raw.gdp.tidy$country == "Sweden" |
raw.gdp.tidy$country == "Switzerland" |
raw.gdp.tidy$country == "Spain" |
raw.gdp.tidy$country == "Slovenia",]
# use logical statements to subset 8 countries from sampling frame
gdp.tidy <- as.data.frame(gdp[gdp$year == "X2011" | gdp$year == "X2012" |
gdp$year == "X2013" | gdp$year == "X2014" |
gdp$year == "X2015" | gdp$year == "X2016" |
gdp$year == "X2017" | gdp$year == "X2018" |
gdp$year == "X2019" | gdp$year == "X2020",])
# use logical statements to subset 8 years from sampling frame
gdp.tidy$year <- as.factor(gdp.tidy$year) # transform year to factor to fix formatting of values
levels(gdp.tidy$year) <- c(2011, 2012, 2013, 2014, 2015, 2016, 2017,
2018, 2019, 2020) # redefine values
gdp.tidy$year <- as.integer(gdp.tidy$year) # transform back to integer
gdp.tidy["year"] <- gdp.tidy$year + 2010 # add 2010 to fix misread values
# step 3: reformat vars
gdp.tidy$country <- as.factor(gdp.tidy$country)
# transform country to factor
gdp.tidy$year <- as.integer(gdp.tidy$year)
# transform year to integer
# check the data
head(gdp.tidy); str(gdp.tidy) ## country year GDP
## 684 Austria 2011 4.31120e+11
## 685 Austria 2012 4.09425e+11
## 686 Austria 2013 4.30069e+11
## 687 Austria 2014 4.41996e+11
## 688 Austria 2015 3.81818e+11
## 689 Austria 2016 3.95569e+11
## 'data.frame': 80 obs. of 3 variables:
## $ country: Factor w/ 8 levels "Austria","Belgium",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ year : int 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 ...
## $ GDP : num 4.31e+11 4.09e+11 4.30e+11 4.42e+11 3.82e+11 ...
## - attr(*, "na.action")= 'omit' Named int [1:3464] 1 2 3 4 5 6 7 8 9 10 ...
## ..- attr(*, "names")= chr [1:3464] "1" "2" "3" "4" ...
Finally, the last step of tidying the data was to merge the three variables we tidied – drug usage, PctYouth, and GDP – into a single dataframe. Since each dataframe included information about the reference country and reference year corresponding to each observation, we merged the dataframes by a combination of reference country and reference year (i.e. we will match values of age, GDP, and drug usage from each of the respective dataframes by the variables reference country and reference year).
data.temp <- (left_join(y = PctYouth.tidy, x = drug.tidy, by = c('country', 'year')))
# merge 1: age and drug usage data
data <- (left_join(y = gdp.tidy, x = data.temp, by = c('country', 'year')))
# merge 2: result of merge 1 and GDP data
head(data); str(data) # check the data## country city Weekday.mean year drug PctYouth GDP
## 1 Belgium Antwerp Zuid 239.64 2011 amphetamine 12.1 5.22646e+11
## 2 Belgium Brussels 24.63 2011 amphetamine 12.1 5.22646e+11
## 3 Spain Barcelona 11.31 2011 amphetamine 10.2 1.47877e+12
## 4 Spain Castellon 0.00 2011 amphetamine 10.2 1.47877e+12
## 5 Spain Santiago 30.58 2011 amphetamine 10.2 1.47877e+12
## 6 Sweden Umea 16.57 2011 amphetamine 13.5 5.74094e+11
## 'data.frame': 1401 obs. of 7 variables:
## $ country : Factor w/ 269 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
## $ Weekday.mean: num 239.6 24.6 11.3 0 30.6 ...
## $ year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ drug : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ PctYouth : num 12.1 12.1 10.2 10.2 10.2 13.5 12.1 10.2 10.2 10.2 ...
## $ GDP : num 5.23e+11 5.23e+11 1.48e+12 1.48e+12 1.48e+12 ...
In the last section, we tidied our data so that it was properly formatted for analysis. Before we proceed with our analysis, it was important to first look at the distributions of variables in our data and to examine the relationships that existed between our variables. We do this first with the drug usage data in Section 5.1, next for the PctYouth data in Section 5.2, and then for the GDP data in Section 5.3. Then, in Section 5.4, we check the relationship between PctYouth and log(GDP). In Section 5.5, we finalize the data we will use in our analysis, presented in Section 6.
Drug usage is a quantitative, continuously distributed variable. Thus, we created a series of 3 boxplots to assess the normality and distribution of these data. First, in Figure 5.1a, we present a boxplot of drug usage; in Figure 5.1b, we present a boxplot of drug usage ~ reference country; finally, in Figure 5.1c, we present a boxplot of drug use ~ reference drug. It is important to note that the distribution of drug usage is heavily right skewed (Figure 5.1a), and a number of potential outliers are shown by open circles in Figures 5.1a - 5.1c. Thus, we chose to log-transform drug usage to create a new variable, log(drug usage).
boxplot(data$Weekday.mean, col = NULL, main = "Figure 5.1a",
ylab = "Drug Usage (mg/1000 people/day)", range = 1.5) bwplot(data$Weekday.mean~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 5.1b",
xlab = "Reference Country", ylab = "Drug Usage (mg/1000 people/day)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$Weekday.mean~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 5.1c",
xlab = "Reference Drug", ylab = "Drug Usage (mg/1000 people/day)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Next, we examine the distributions of log(drug use). log(drug use) is presented in Figure 5.1d, log(drug use) ~ reference country is presented in Figure 6.1e, and log(drug use) ~ reference drug is presented in Figure 5.1f. As shown in Figure 5.1d, there are no potential outliers identified in the boxplot of log(drug use). Although several potential outliers are highlighted in the below plots with open circles, these points are relatively few in number and small in magnitude when compared to distributions of the drug use data before the log-transformation (Figures 5.1a-c). Thus, these plots suggest the assumption is met for the log(drug use) data.
Z <- data$Weekday.mean+10 # to avoid NA errors
temp <- log(10 + data$Weekday.mean)
data["log.Weekday.mean"] <- temp
data$log.Weekday.mean[is.na(data$log.Weekday.mean)] <- 0
boxplot(data$log.Weekday.mean, col = NULL, main = "Figure 5.1d",
ylab = "Drug Usage (log(mg/1000 people/day))", range = 1.5) bwplot(data$log.Weekday.mean~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 5.1e",
xlab = "Reference Country", ylab = "Drug Usage (log(mg/1000 people/day))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$log.Weekday.mean~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 5.1f",
xlab = "Reference Drug", ylab = "Drug Usage (log(mg/1000 people/day))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Like drug usage, PctYouth is a quantitative, continuously distributed variable. Thus, we created a series of 3 boxplots to assess the normality and distribution of these data. Next, we examine the distributions of PctYouth. PctYouth is presented in Figure 5.2a, PctYouth ~ reference country is presented in Figure 5.2b, and PctYouth ~ reference drug is presented in Figure 5.4c. As shown in Figure 5.4c, there is one potential outliers identified in the boxplot of PctYouth. Again, these potential outliers are highlighted with open circles, and these points are relatively few in number and small in magnitude. Thus, these plots suggest the assumption is met for the PctYouth data.
boxplot(data$PctYouth, col = NULL, main = "Figure 5.2a",
ylab = "Population Aged 15-24 (Percent)", range = 1.5) bwplot(data$PctYouth~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 5.2b",
xlab = "Reference Country", ylab = "Population Aged 15-24 (Percent)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$PctYouth~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 5.2c",
xlab = "Reference Drug", ylab = "Population Aged 15-24 (Percent)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Like drug usage and PctYouth, GDP is a quantitative, continuously distributed variable. Thus, we created a series of 3 boxplots to assess the normality and distribution of these data. First, in Figure 5.3a, we present a boxplot of GDP. Next, in Figure 5.3b, we present a boxplot of GDP ~ reference country. Finally, in Figure 5.3c, we present a boxplot of GDP ~ reference drug. It is important to note that the distribution of GDP is right skewed (Figure 5.3a), and a number of potential outliers are shown by open circles in Figures 5.3a - 5.3c. Thus, we chose to log-transform GDP to create a new variable, log(GDP).
boxplot(data$GDP, col = NULL, main = "Figure 5.3a",
ylab = "GDP (USD)", range = 1.5) bwplot(data$GDP~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 5.3b",
xlab = "Reference Country", ylab = "(USD)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$GDP~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 5.3c",
xlab = "Reference Drug", ylab = "(USD)",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) Finally, we assess the normality of log(GDP). log(GDP) is presented in Figure 5.3d, log(GDP) ~ reference country is presented in Figure 5.3e, and log(GDP) ~ reference drug is presented in Figure 5.3f. As shown in Figure 5.3d, there are no potential outliers identified in the boxplot of PctYouth. Potential outliers shown in Figure 5.3e and Figure 5.3f, highlighted with open circles, are relatively few in number and small in magnitude. Thus, these plots suggest the assumption is met for the log(GDP) data.
Y <- data$GDP
temp2 <- log(data$GDP)
data["log.GDP"] <- temp2
data$log.GDP[is.na(data$log.GDP)] <- 0
boxplot(data$log.GDP, col = NULL, main = "Figure 5.3d",
ylab = "GDP (log(USD))", range = 1.5) bwplot(data$log.GDP~data$country, strip = strip.custom(bg = 'white'),
cex = 0.5, cex.lab=1.5, cex.axis=1.5, cex.main=1.5, layout = c(1, 1),
main = "Figure 5.3e",
xlab = "Reference Country", ylab = "GDP (log(USD))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same")))
bwplot(data$log.GDP~data$drug, strip = strip.custom(bg = 'white'),
cex = .5, layout = c(1, 1),
main = "Figure 5.3f",
xlab = "Reference Drug", ylab = "GDP (log(USD))",
par.settings = list(
box.rectangle = list(col = 1),
box.umbrella = list(col = 1),
plot.symbol = list(col=1),
plot.symbol = list(cex = .5)),
scales = list(x = list(relation = "same"),
y = list(relation = "same"))) In the following subsection, we present a pair of multi-pannel scatterplots to assess the relationship between PctYouth and log(GDP) within each reference country (Figure 5.5a) and for each reference drug (Figure 5.5b). These figures suggest that colinearity is not of concern, as there is no clear trend in these plots.
xyplot(data$log.GDP ~ data$PctYouth | data$country, col = 1, type = c("p", "r"),
strip = function(bg='white',...) strip.default(bg='white',...),
scales = list(alternating = T,
x = list(relation = "free"),
y = list(relation = "same")),
main = "Figure 5.5a",
xlab = "Population Aged 15-24 (Percent)",
par.strip.text = list(cex = 0.8),
ylab = "GDP (log(USD))",)
xyplot(data$log.GDP ~ data$PctYouth | data$drug, col = 1, type = c("p", "r"),
strip = function(bg='white',...) strip.default(bg='white',...),
scales = list(alternating = T,
x = list(relation = "free"),
y = list(relation = "same")),
main = "Figure 5.5b",
xlab = "Population Aged 15-24 (Percent)",
par.strip.text = list(cex = 0.8),
ylab = "GDP (log(USD))",)Thus, in Section 6 we will proceed with our analysis and construct a linear model of log(drug usage) by PctYouth and log(GDP), based on the distributions explored in the above subsections. In this final subsection, we finalize the data for analysis.
data <- data %>% select(-Weekday.mean, -GDP) # remove non-transformed variables
head(data); str(data) # check data## country city year drug PctYouth log.Weekday.mean log.GDP
## 1 Belgium Antwerp Zuid 2011 amphetamine 12.1 5.520020 26.98217
## 2 Belgium Brussels 2011 amphetamine 12.1 3.544720 26.98217
## 3 Spain Barcelona 2011 amphetamine 10.2 3.059176 28.02223
## 4 Spain Castellon 2011 amphetamine 10.2 2.302585 28.02223
## 5 Spain Santiago 2011 amphetamine 10.2 3.703275 28.02223
## 6 Sweden Umea 2011 amphetamine 13.5 3.279783 27.07606
## 'data.frame': 1401 obs. of 7 variables:
## $ country : Factor w/ 269 levels "Belgium","Czech Republic",..: 1 1 3 3 3 10 1 3 3 3 ...
## $ city : chr "Antwerp Zuid" "Brussels" "Barcelona" "Castellon" ...
## $ year : int 2011 2011 2011 2011 2011 2011 2011 2011 2011 2011 ...
## $ drug : Factor w/ 5 levels "amphetamine",..: 1 1 1 1 1 1 2 2 2 2 ...
## $ PctYouth : num 12.1 12.1 10.2 10.2 10.2 13.5 12.1 10.2 10.2 10.2 ...
## $ log.Weekday.mean: num 5.52 3.54 3.06 2.3 3.7 ...
## $ log.GDP : num 27 27 28 28 28 ...
In the following section, we proceeded with our analysis to test our hypothesis that the effect of country on drug usage is a function of PctYouth and GDP. We present our analysis following the Plot -> Model -> Check Assumptions -> Interpret -> Plot Again workflow outlined in Beckerman et al. (2017).
In this subsection, we look at amphetamine usage. To perform this analysis, we created a subset of only the amphetamine data in a new dataframe called amphetplot. We present the summary statistics (mean, sample size, and standard error by country) for the new dataframe below.
#Create new dataframe containing only data pertaining to amphetamine
amphetplot <- data %>%
filter(drug == "amphetamine") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#Create dataframe with mean amphetamine levels, sample size, and standard error by country
plot3 <- amphetplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot3## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 4.68 44 0.125
## 2 Spain 3.65 63 0.140
## 3 Sweden 5.16 13 0.327
## 4 Switzerland 3.43 43 0.0939
## 5 Finland 4.17 66 0.0692
## 6 Germany 4.20 58 0.125
## 7 Austria 3.10 24 0.0683
## 8 Slovenia 3.12 16 0.177
Next, we present the log(drug usage) (Figure 6.1a), PctYouth (Figure 6.1b), and log(GDP) (Figure 6.1c) data associated with amphetamine as simple point plots. For the first plot of reference country vs amphetamine consumption, Austria appears to consume the least amount of amphetamine, while Belgium and Germany appear to consume the most (Figure 6.1a). The PctYouth vs amphetamine consumption plot appears to show a positive relationship between PctYouth and amphetamine consumption (Figure 6.1b). The log(GDP) vs amphetamine plot does not appear to show any relationship (Figure 6.1c).
#Plot amphetamine data by country
ggplot(data = amphetplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("Amphetamine Consumed (log (mg/1000 people/day))") +
ggtitle("Figure 6.1a") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))#Plot amphetamine data by age
ggplot(data = amphetplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("Amphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.1b") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
#Plot amphetamine data by GDP
ggplot(data = amphetplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP(log(USD))") +
ylab("Amphetamine Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.1c") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Then, we fit a linear model to assess our hypothesis that the effect of country on amphetamine usage is a function of PctYouth and log(GDP). To address our hypothesis, we are particularly interested in the interaction effects between reference country : Pct Youth and reference country : log(GDP).
#Create Model
ampmod <- lm(data = amphetplot, log.Weekday.mean ~ country + PctYouth +
log.GDP + country:PctYouth + country:log.GDP) Before we can run the linear model with amphetamine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.1d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. In the Normal QQ plot (Figure 6.1e), most of the residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.1f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.1g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for amphetamine.
#Check assumptions
amphetautoplot <- autoplot(ampmod, smooth.colour = NA) + theme_bw()
amphetautoplot[[1]] +
labs(title = "Figure 6.1d", # add labels
x = "Amphetamine Model Fitted Values",
y = "Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
amphetautoplot[[2]] +
labs(title = "Figure 6.1e", # add labels
x = "Amphetamine Model Theoretical Quantities",
y = "Standardized Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
amphetautoplot[[3]] +
labs(title = "Figure 6.1f", # add labels
x = "Amphetamine Model Fitted Values") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
amphetautoplot[[4]] +
labs(title = "Figure 6.1g", # add labels
x = "Amphetamine Model Factor Level Combination") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Then, we produce an ANOVA table and summary table to help interpret our model. Most of the variation in amphetamine usage is explained by PctYouth (Mean sq value of 21.27), and some variation is also explained by reference country (Mean sq value of 13.57). log(GDP) and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP) and the interaction terms in the following subsection.
#Generate ANOVA table and summary
anova(ampmod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 95.004 13.5720 21.9771 < 2.2e-16 ***
## PctYouth 1 21.273 21.2731 34.4476 1.148e-08 ***
## log.GDP 1 0.964 0.9643 1.5615 0.2124
## country:PctYouth 7 7.311 1.0444 1.6912 0.1106
## country:log.GDP 7 3.730 0.5328 0.8627 0.5364
## Residuals 303 187.118 0.6176
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(ampmod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = amphetplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.1165 -0.5335 0.1072 0.4922 1.8601
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 30.4618 58.2639 0.523 0.6015
## countrySpain 82.8702 78.5346 1.055 0.2922
## countrySweden -339.8859 197.4717 -1.721 0.0862 .
## countrySwitzerland 48.2286 153.9444 0.313 0.7543
## countryFinland -30.8362 73.2539 -0.421 0.6741
## countryGermany -38.4430 87.8976 -0.437 0.6622
## countryAustria -26.3494 134.1149 -0.196 0.8444
## countrySlovenia 127.5023 290.3433 0.439 0.6609
## PctYouth -0.7935 0.4503 -1.762 0.0791 .
## log.GDP -0.6145 2.1349 -0.288 0.7737
## countrySpain:PctYouth -1.7674 0.8356 -2.115 0.0352 *
## countrySweden:PctYouth -0.3437 0.5284 -0.651 0.5159
## countrySwitzerland:PctYouth 0.1084 0.5575 0.194 0.8460
## countryFinland:PctYouth 0.3875 0.5502 0.704 0.4818
## countryGermany:PctYouth -0.3738 1.0795 -0.346 0.7294
## countryAustria:PctYouth 0.1103 0.8484 0.130 0.8966
## countrySlovenia:PctYouth 0.8678 4.6630 0.186 0.8525
## countrySpain:log.GDP -2.4279 2.8949 -0.839 0.4023
## countrySweden:log.GDP 12.7411 7.3737 1.728 0.0850 .
## countrySwitzerland:log.GDP -1.8619 5.5847 -0.333 0.7391
## countryFinland:log.GDP 0.9663 2.6947 0.359 0.7202
## countryGermany:log.GDP 1.4590 3.0373 0.480 0.6313
## countryAustria:log.GDP 0.8552 4.8600 0.176 0.8604
## countrySlovenia:log.GDP -5.6808 10.4862 -0.542 0.5884
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7858 on 303 degrees of freedom
## Multiple R-squared: 0.4067, Adjusted R-squared: 0.3617
## F-statistic: 9.032 on 23 and 303 DF, p-value: < 2.2e-16
Given the results of the ANOVA table produced below, we then simplify our model to exclude GDP and the variable interactions. These data suggest that there is a significant difference in amphetamine consumption among the European countries examined in this study, with Sweden consuming the most amphetamine and Slovenia consuming the least (F=21.67, df=7, p<0.05). PctYouth also had a significant effect on amphetamine consumption, with a lower proportion of youth in the population being associated with higher amounts of amphetamine usage (F=33.97, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on amphetamine usage depends on PctYouth and GDP was not supported.
#Recreate model without GDP and interaction terms
ampmod2 <- lm(data = amphetplot, log.Weekday.mean ~ country + PctYouth)
#Generate ANOVA table
anova(ampmod2)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 95.004 13.5720 21.674 < 2.2e-16 ***
## PctYouth 1 21.273 21.2731 33.973 1.371e-08 ***
## Residuals 318 199.122 0.6262
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(ampmod2)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth, data = amphetplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.33266 -0.56376 0.08089 0.50235 2.07742
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 13.8310 1.5747 8.783 < 2e-16 ***
## countrySpain -2.5576 0.3044 -8.403 1.47e-15 ***
## countrySweden 0.4142 0.2501 1.656 0.098699 .
## countrySwitzerland -1.6067 0.1807 -8.891 < 2e-16 ***
## countryFinland -0.5321 0.1541 -3.454 0.000627 ***
## countryGermany -1.3542 0.2178 -6.216 1.60e-09 ***
## countryAustria -2.1264 0.2215 -9.602 < 2e-16 ***
## countrySlovenia -3.5184 0.4080 -8.623 3.15e-16 ***
## PctYouth -0.7876 0.1351 -5.829 1.37e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7913 on 318 degrees of freedom
## Multiple R-squared: 0.3687, Adjusted R-squared: 0.3528
## F-statistic: 23.21 on 8 and 318 DF, p-value: < 2.2e-16
Finally, we replot our amphetamine data using predicted values from our linear model by PctYouth in Figure 6.1h.
#Replot with age and country in same plot
#Make new x values
new.x4 <- expand.grid(
country = unique(amphetplot$country),
PctYouth = seq(from=min(amphetplot$PctYouth), to=max(amphetplot$PctYouth) +1),
log.GDP = mean(amphetplot$log.GDP))
#New y values predicted from the model
new.y4 <- predict(ampmod2, newdata = new.x4, interval = "confidence")
#Add the new x and y to a data frame called addThese2
addThese4 <- data.frame(new.x4, new.y4)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese4, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
geom_errorbar(data = addThese4, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean Amphetamine Usage
(log mg/1000 people/day)") +
ggtitle("Figure 6.1h") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))In this subsection, we look at cocaine usage. To perform this analysis, we created a subset of only the cocaine data in a new dataframe called cocaineplot. We present the summary statistics (mean, sample size, and standard error by country) for the cocaine data below.
#Create new dataframe containing only data pertaining to cocaine
cocaineplot <- data %>%
filter(drug == "cocaine") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#Create dataframe with mean cocaine levels, sample size, and standard error by country
plot2 <- cocaineplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot2## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 5.44 47 0.141
## 2 Spain 4.52 66 0.190
## 3 Sweden 3.61 11 0.298
## 4 Switzerland 6.01 49 0.0681
## 5 Finland 2.77 66 0.0744
## 6 Germany 4.55 58 0.134
## 7 Austria 4.77 24 0.152
## 8 Slovenia 5.40 10 0.189
Next, we plot the log(drug) (Figure 6.2a), PctYouth (Figure 6.2b), and log(GDP) (Figure 6.2c) data associated with cocaine, presented as simple point plots. Thus, we present three plots. For the first plot of reference country vs cocaine consumption, Finland appears to consume the least amount of cocaine while Belgium appears to consume the most (Figure 6.2a). The PctYouth vs cocaine consumption plot appears to show a negative relationship between PctYouth and cocaine consumption (Figure 6.2b). The log(GDP) vs cocaine plot does not appear to show any relationship (Figure 6.2c).
#Plot our cocaine data by country
ggplot(data = cocaineplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.2a") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))#Plot our cocaine data by age
ggplot(data = cocaineplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.2b") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
#Plot our cocaine data by GDP
ggplot(data = cocaineplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP (log(USD))") +
ylab("Cocaine Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.2c") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Then, we fit a linear model, where we hypothesize that the effect of reference country on cocaine usage is a function of PctYouth and log(GDP), using the variables from the cocaineplot data frame. To address our hypothesis, we are particularly interested the interaction effects reference country : PctYouth and reference country : log(GDP).
#Create Model
cocmod <- lm(data = cocaineplot, log.Weekday.mean ~ country + PctYouth +
log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with cocaine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.2d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. In the Normal QQ plot (Figure 6.2e), most of the residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.2f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.2g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for cocaine.
#Check assumptions
cocautoplot <- autoplot(cocmod, smooth.colour = NA) + theme_bw()
cocautoplot[[1]] +
labs(title = "Figure 6.2d", # add labels
x = "Cocaine Model Fitted Values",
y = "Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
cocautoplot[[2]] +
labs(title = "Figure 6.2e", # add labels
x = "Cocaine Model Theoretical Quantities",
y = "Standardized Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
cocautoplot[[3]] +
labs(title = "Figure 6.2f", # add labels
x = "Cocaine Model Fitted Values") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
cocautoplot[[4]] +
labs(title = "Figure 6.2g", # add labels
x = "Cocaine Model Factor Level Combination") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Next, we produce an ANOVA table and summary table to help interpret our model. First looking at the ANOVA table produced below, most of the variation in cocaine usage is explained by reference country (Mean sq value of 52.613), and some variation is also explained by PctYouth (Mean sq value of 7.349), log(GDP) (Mean sq value of 8.156), the interaction between country and PctYouth (Mean sq value of 3.970), and the interaction between country and log(GDP) (Mean sq value of 1.849).
Results from the ANOVA table suggest that there is a significant interaction effect between reference country : PctYouth (F=4.67, df=7, p<0.05) and reference country : log(GDP) (F=2.18, df=7, p<0.05), which suggests that our hypothesis that the effect of reference country on cocaine usage is a function of PctYouth and GDP was supported. Since the interaction effects were significant, the main effects cannot be interpreted alone.
#Generate ANOVA table and summary
anova(cocmod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 368.29 52.613 61.9014 < 2.2e-16 ***
## PctYouth 1 7.35 7.349 8.6470 0.003525 **
## log.GDP 1 8.16 8.156 9.5961 0.002130 **
## country:PctYouth 7 27.79 3.970 4.6705 5.433e-05 ***
## country:log.GDP 7 12.94 1.849 2.1750 0.036263 *
## Residuals 307 260.93 0.850
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(cocmod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = cocaineplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.00355 -0.49919 0.00044 0.52605 2.77911
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -17.54079 68.31162 -0.257 0.797524
## countrySpain -261.28924 92.04349 -2.839 0.004831 **
## countrySweden 302.93586 268.21411 1.129 0.259588
## countrySwitzerland -23.44064 176.64377 -0.133 0.894518
## countryFinland 29.02660 86.45787 0.336 0.737303
## countryGermany 31.95908 103.09092 0.310 0.756765
## countryAustria 37.55696 157.32096 0.239 0.811476
## countrySlovenia 471.21647 381.57122 1.235 0.217798
## PctYouth -0.76215 0.49820 -1.530 0.127097
## log.GDP 1.18184 2.49932 0.473 0.636645
## countrySpain:PctYouth 3.54039 0.89985 3.934 0.000103 ***
## countrySweden:PctYouth 0.50043 0.59868 0.836 0.403869
## countrySwitzerland:PctYouth 0.22038 0.61075 0.361 0.718477
## countryFinland:PctYouth -0.08913 0.62017 -0.144 0.885820
## countryGermany:PctYouth -0.67960 1.25419 -0.542 0.588304
## countryAustria:PctYouth 0.65721 0.97971 0.671 0.502838
## countrySlovenia:PctYouth -5.05151 6.61438 -0.764 0.445622
## countrySpain:log.GDP 8.00772 3.38388 2.366 0.018581 *
## countrySweden:log.GDP -11.48903 10.01414 -1.147 0.252159
## countrySwitzerland:log.GDP 0.76289 6.39506 0.119 0.905121
## countryFinland:log.GDP -1.13794 3.17493 -0.358 0.720280
## countryGermany:log.GDP -0.99928 3.55957 -0.281 0.779107
## countryAustria:log.GDP -1.70790 5.69929 -0.300 0.764633
## countrySlovenia:log.GDP -17.17425 13.50627 -1.272 0.204486
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9219 on 307 degrees of freedom
## Multiple R-squared: 0.6193, Adjusted R-squared: 0.5908
## F-statistic: 21.72 on 23 and 307 DF, p-value: < 2.2e-16
Finally, we replot our cocaine data using predicted values from our linear model to visualize the interaction. In Figure 6.2h, we plot the cocaine model predicted values by PctYouth. In Figure 6.2i, we plot the cocaine model predicted values by log(GDP).
#Now do the same with age
#Make new x values
new.x2 <- expand.grid(
country = unique(cocaineplot$country),
PctYouth = seq(from=min(cocaineplot$PctYouth), to=max(cocaineplot$PctYouth) +1),
log.GDP = mean(cocaineplot$log.GDP))
#New y values predicted from the model
new.y2 <- predict(cocmod, newdata = new.x2, interval = "confidence")
#Add the new x and y to a data frame called addThese2
addThese2 <- data.frame(new.x2, new.y2)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese2, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
geom_errorbar(data = addThese2, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean Cocaine Consumed (log mg/1000 people/day)") +
ggtitle("Figure 6.2h") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
# Plot by log(GDP) vs cocaine by country to visualize the interaction.
# Make new x values
new.x1 <- expand.grid(
country = unique(cocaineplot$country),
PctYouth = mean(cocaineplot$PctYouth),
log.GDP = seq(from=min(cocaineplot$log.GDP), to=max(cocaineplot$log.GDP) +1))
#New y values predicted from the model
new.y1 <- predict(cocmod, newdata = new.x1, interval = "confidence")
#Add the new x and y to a data frame called addThese1
addThese1 <- data.frame(new.x1, new.y1)
#Plot GDP and cocaine consumption by country using predicted values
ggplot(data = addThese1, aes(x = log.GDP, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
geom_errorbar(data = addThese1, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
xlab("log(GDP)") +
ylab("Mean Cocaine Consumed (log mg/1000 people/day)") +
ggtitle("Figure 6.2i") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))In the following subsection, we investigate MDMA usage. To perform this analysis, we created a subset of the MDMA data in a new dataframe called MDMAplot. The summary statistics (mean, sample size, and standard error by country) are presented below.
#Create new dataframe containing only data pertaining to MDMA
MDMAplot <- data %>%
filter(drug == "MDMA") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#Create dataframe with mean MDMA levels, sample size, and standard error by country
plot5 <- MDMAplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot5## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 3.33 46 0.0860
## 2 Spain 2.98 64 0.0480
## 3 Sweden 2.98 11 0.162
## 4 Switzerland 3.13 49 0.0407
## 5 Finland 2.98 68 0.0386
## 6 Germany 3.05 58 0.0531
## 7 Austria 2.91 24 0.0495
## 8 Slovenia 2.92 10 0.0874
Next, we present three plots: MDMA consumed by European country (Figure 6.3a), MDMA consumed vs. PctYouth (Figure 6.3b), and MDMA consumed vs. GDP (Figure 6.3c). For the first plot of reference country vs MDMA consumption, Sweden appears to consume the least amount of MDMA while Belgium appears to consume the most (Figure 6.3a). The PctYouth vs MDMA consumption plot does not appear to show any relationship (Figure 6.3b), and neither does the log(GDP) vs MDMA plot (Figure 6.3c).
#Plot MDMA data by country
ggplot(data = MDMAplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("MDMA Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.3a") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))#Plot MDMA data by age
ggplot(data = MDMAplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("MDMA consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.3b") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
#Plot MDMA data by GDP
ggplot(data = MDMAplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP (log(USD))") +
ylab("MDMA consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.3c") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Then, we fit a linear model, where we hypothesize that the effect of reference country on MDMA usage is a function of PctYouth and GDP, using the variables from the MDMAplot data frame. To address our hypothesis, we are particularly interested the interaction effects between reference country : PctYouth and reference country : log(GDP).
#Create Model
mdmamod <- lm(data = MDMAplot, log.Weekday.mean ~ country + PctYouth +
log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with MDMA data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.3d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. The Normal QQ plot (Figure 6.3e) isn’t perfect, the positive residuals seem to be larger than expected, however, most of the other residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.3f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.3g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for MDMA.
#Check assumptions
MDMAautoplot <- autoplot(mdmamod, smooth.colour = NA) + theme_bw()
MDMAautoplot[[1]] +
labs(title = "Figure 6.3d", # add labels
x = "MDMA Model Fitted Values",
y = "Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
MDMAautoplot[[2]] +
labs(title = "Figure 6.3e", # add labels
x = "MDMA Model Theoretical Quantities",
y = "Standardized Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
MDMAautoplot[[3]] +
labs(title = "Figure 6.3f", # add labels
x = "MDMA Model Fitted Values") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
MDMAautoplot[[4]] +
labs(title = "Figure 6.3g", # add labels
x = "MDMA Model Factor Level Combination") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Thus, we produce an ANOVA table and summary table to help interpret our model. First looking at the ANOVA table produced below, most of the variation in MDMA usage is explained by PctYouth (Mean sq value of 4.59). Some variation is also explained by reference country (Mean sq of 0.73). log(GDP) and all variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP) and the interaction terms in the following section.
#Generate ANOVA table and summary
anova(mdmamod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 5.127 0.7324 5.1533 1.465e-05 ***
## PctYouth 1 4.593 4.5928 32.3168 3.056e-08 ***
## log.GDP 1 0.293 0.2926 2.0588 0.1523
## country:PctYouth 7 0.907 0.1296 0.9120 0.4973
## country:log.GDP 7 0.266 0.0380 0.2673 0.9662
## Residuals 306 43.488 0.1421
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(mdmamod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = MDMAplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.7918 -0.2443 -0.0384 0.1841 1.3732
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 21.08331 28.31814 0.745 0.4571
## countrySpain -13.60951 37.94324 -0.359 0.7201
## countrySweden 12.51519 109.77401 0.114 0.9093
## countrySwitzerland -18.52768 72.38106 -0.256 0.7981
## countryFinland -1.77287 35.40797 -0.050 0.9601
## countryGermany -2.05565 42.41088 -0.048 0.9614
## countryAustria 28.60916 64.49811 0.444 0.6577
## countrySlovenia 139.90580 156.09754 0.896 0.3708
## PctYouth -0.46911 0.20502 -2.288 0.0228 *
## log.GDP -0.45609 1.03823 -0.439 0.6608
## countrySpain:PctYouth -0.35218 0.37198 -0.947 0.3445
## countrySweden:PctYouth 0.09349 0.24589 0.380 0.7041
## countrySwitzerland:PctYouth 0.26412 0.25080 1.053 0.2931
## countryFinland:PctYouth 0.12617 0.25063 0.503 0.6150
## countryGermany:PctYouth -0.15621 0.51337 -0.304 0.7611
## countryAustria:PctYouth 0.45198 0.40127 1.126 0.2609
## countrySlovenia:PctYouth -0.58775 2.70478 -0.217 0.8281
## countrySpain:log.GDP 0.58043 1.39615 0.416 0.6779
## countrySweden:log.GDP -0.50944 4.09896 -0.124 0.9012
## countrySwitzerland:log.GDP 0.56165 2.62139 0.214 0.8305
## countryFinland:log.GDP -0.01347 1.30349 -0.010 0.9918
## countryGermany:log.GDP 0.13117 1.46699 0.089 0.9288
## countryAustria:log.GDP -1.28246 2.33766 -0.549 0.5837
## countrySlovenia:log.GDP -5.55083 5.52587 -1.005 0.3159
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.377 on 306 degrees of freedom
## Multiple R-squared: 0.2046, Adjusted R-squared: 0.1448
## F-statistic: 3.422 on 23 and 306 DF, p-value: 5.187e-07
Based on the results of the ANOVA table shown below, we simplify our model to exclude log(GDP) and the interaction terms.Results from the ANOVA table suggest that there is a significant difference in MDMA consumption among the European countries examined in this study, with Belgium consuming the most MDMA and Slovenia consuming the least (F=5.23, df=7, p<0.05). Age also had a significant effect on MDMA consumption, with a lower proportion of youth in the population being associated with higher amounts of MDMA usage (F=32.80, df=1, p<0.05). Since there were no significant interaction effects between reference country : PctYouth or reference country : log(GDP), our hypothesis that the effect of reference country on MDMA usage is a function of PctYouth and log(GDP) was not supported.
#Recreate model without GDP, age, and interaction terms
MDMAmod2 <- lm(data = MDMAplot, log.Weekday.mean ~ country + PctYouth)
#Generate ANOVA table
anova(MDMAmod2)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 5.127 0.7324 5.2296 1.151e-05 ***
## PctYouth 1 4.593 4.5928 32.7956 2.354e-08 ***
## Residuals 321 44.954 0.1400
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(MDMAmod2)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth, data = MDMAplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.79226 -0.26697 -0.04389 0.20348 1.39415
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.29886 0.69583 10.489 < 2e-16 ***
## countrySpain -1.01336 0.13695 -7.399 1.21e-12 ***
## countrySweden -0.21835 0.12769 -1.710 0.0882 .
## countrySwitzerland -0.33219 0.08045 -4.129 4.65e-05 ***
## countryFinland -0.35859 0.07149 -5.016 8.74e-07 ***
## countryGermany -0.66749 0.10025 -6.659 1.20e-10 ***
## countryAustria -0.65860 0.10354 -6.361 6.91e-10 ***
## countrySlovenia -1.25628 0.19800 -6.345 7.57e-10 ***
## PctYouth -0.34097 0.05954 -5.727 2.35e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3742 on 321 degrees of freedom
## Multiple R-squared: 0.1778, Adjusted R-squared: 0.1573
## F-statistic: 8.675 on 8 and 321 DF, p-value: 9.954e-11
In Figure 6.3h, we replot our MDMA data using predicted values from our linear model by PctYouth.
#Plot PctYouth and country vs mean MDMA consumption
#Make new x values
new.x7 <- expand.grid(
country = unique(MDMAplot$country),
PctYouth = seq(from=min(MDMAplot$PctYouth), to=max(MDMAplot$PctYouth) +1),
log.GDP = mean(MDMAplot$log.GDP))
#New y values predicted from the model
new.y7 <- predict(MDMAmod2, newdata = new.x7, interval = "confidence")
#Add the new x and y to a data frame called addThese7
addThese7 <- data.frame(new.x7, new.y7)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese7, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
geom_errorbar(data = addThese7, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean MDMA Consumed (log(mg/1000 people/day))") +
ggtitle("Figure 6.3h") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Finally, in this last subsection, we look at methamphetamine usage. To perform this analysis, we created a subset of only the methamphetamine usage data in a new dataframe called methplot. Summary statistics (mean, sample size, and standard error by country) are shown below.
#Create new dataframe containing only data pertaining to methamphetamine
methplot <- data %>%
filter(drug == "methamphetamine") %>%
select(country, PctYouth, log.Weekday.mean, log.GDP)
#Create table with mean methamphetamine levels, sample size, and standard error by country
plot4 <- methplot %>%
group_by(country) %>%
summarise(
mean = mean(log.Weekday.mean, na.rm = TRUE),
samplesize = n(),
SE = std.error(log.Weekday.mean)
)
plot4## # A tibble: 8 × 4
## country mean samplesize SE
## <fct> <dbl> <int> <dbl>
## 1 Belgium 2.65 47 0.0611
## 2 Spain 2.64 66 0.0622
## 3 Sweden 2.63 15 0.133
## 4 Switzerland 3.09 43 0.0946
## 5 Finland 2.97 64 0.0692
## 6 Germany 3.49 58 0.172
## 7 Austria 2.51 24 0.0308
## 8 Slovenia 2.56 11 0.150
Then, we plot the log(drug usage) (Figure 6.4a), PctYouth (Figure 6.4b), and log(GDP) (Figure 6.4c) data associated with methamphetamine, presented as simple point plots. For the first plot of reference country vs methamphetamine consumption, Austria appears to consume the least amount of methamphetamine, while Germany appears to consume the most (Figure 6.4a). The PctYouth vs methamphetamine consumption plot does not appear to show any relationship (Figure 6.4b). The log(GDP) vs methamphetamine plot looks like there may be a positive relationship between log(GDP) and methamphetamine consumption (Figure 6.4c).
#Plot methamphetamine data by country
ggplot(data = methplot, aes(x = country, y = log.Weekday.mean)) +
geom_point(size = 4) +
xlab("Country") +
ylab("Methamphetamine Usage
(log(mg/1000 people/day))") +
ggtitle("Figure 6.3a") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))#Plot methamphetamine data by age
ggplot(data = methplot, aes(x = PctYouth, y = log.Weekday.mean)) +
geom_point() +
xlab("Percent Youth (Ages 15-24)") +
ylab("Methamphetamine Usage
(log(mg/1000 people/day))") +
ggtitle("Figure 6.3b") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
#Plot methamphetamine data by GDP
ggplot(data = methplot, aes(x = log.GDP, y = log.Weekday.mean)) +
geom_point() +
xlab("GDP (log(USD))") +
ylab("Methamphetamine Usage
(log(mg/1000 people/day))") +
ggtitle("Figure 6.3c") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Next, we fit a linear model to assess our hypothesis that the effect of reference country on methamphetamine usage is a function of PctYouth and log(GDP). We are particularly interested the interaction effects between reference country : PctYouth and reference country : log(GDP).
#Create Model
methmod <- lm(data = methplot, log.Weekday.mean ~ country + PctYouth +
log.GDP + country:PctYouth + country:log.GDP)Before we can run the linear model with methamphetamine data, we first had to verify that the assumptions of this analysis were met. In the residuals vs. fitted plot (Figure 6.4d), there does not appear to be any major hump-shapes or valleys, suggesting the linear model may be a good fit. The Normal QQ plot (Figure 6.4e) isn’t perfect, the positive residuals seem to be larger than expected, however, most of the other residuals fall on the line suggesting that these data are approximately normal. The scale-location plot (Figure 6.4f) does not show any obvious pattern, suggesting that our data have approximately equal variance. Finally, the Constant Leverage plot (Figure 6.4g) shows no obvious influential data points. Thus, we were ready to continue with the linear model for methamphetamine.
#Check assumptions
methautoplot <- autoplot(methmod, smooth.colour = NA) + theme_bw()
methautoplot[[1]] +
labs(title = "Figure 6.4d", # add labels
x = "Methamphetamine Model Fitted Values",
y = "Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
methautoplot[[2]] +
labs(title = "Figure 6.4e", # add labels
x = "Methamphetamine Model Theoretical Quantities",
y = "Standardized Resiudals") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
methautoplot[[3]] +
labs(title = "Figure 6.4f", # add labels
x = "Methamphetamine Model Fitted Values") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))
methautoplot[[4]] +
labs(title = "Figure 6.4g", # add labels
x = "Methamphetamine Model Factor Level Combination") +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))We produce an ANOVA table and summary table to help interpret our model. First looking at the ANOVA table produced below, most of the variation in methamphetamine usage is explained by reference country (Mean sq value of 5.03). A significant amount of variation was also explained by the interaction between reference country and PctYouth (Mean sq value of 1.04). While log(GDP), PctYouth, and all other variable interactions did not capture any significant amount of variation. With this in mind, we will simplify our model to exclude log(GDP), PctYouth and the interaction between reference country and log(GDP).
#Generate ANOVA table and summary
anova(methmod)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 35.208 5.0298 10.1483 2.027e-11 ***
## PctYouth 1 1.346 1.3458 2.7153 0.10043
## log.GDP 1 0.050 0.0500 0.1010 0.75088
## country:PctYouth 7 7.250 1.0357 2.0897 0.04446 *
## country:log.GDP 7 4.839 0.6913 1.3948 0.20690
## Residuals 304 150.671 0.4956
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
summary(methmod)##
## Call:
## lm(formula = log.Weekday.mean ~ country + PctYouth + log.GDP +
## country:PctYouth + country:log.GDP, data = methplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.5017 -0.3675 -0.1477 0.2276 2.3204
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.86653 52.16471 -0.170 0.8651
## countrySpain -23.93406 70.28705 -0.341 0.7337
## countrySweden 374.61026 199.45175 1.878 0.0613 .
## countrySwitzerland 32.45723 137.16714 0.237 0.8131
## countryFinland 94.93113 66.12107 1.436 0.1521
## countryGermany -30.42431 78.72319 -0.386 0.6994
## countryAustria 42.66473 120.13481 0.355 0.7227
## countrySlovenia 148.62411 281.76235 0.527 0.5982
## PctYouth -0.41878 0.38044 -1.101 0.2719
## log.GDP 0.60808 1.90855 0.319 0.7502
## countrySpain:PctYouth 0.10361 0.68715 0.151 0.8803
## countrySweden:PctYouth 0.82704 0.43764 1.890 0.0597 .
## countrySwitzerland:PctYouth -0.22490 0.47120 -0.477 0.6335
## countryFinland:PctYouth 0.07291 0.48399 0.151 0.8804
## countryGermany:PctYouth -1.68455 0.95773 -1.759 0.0796 .
## countryAustria:PctYouth 0.06624 0.74813 0.089 0.9295
## countrySlovenia:PctYouth -3.31343 4.78856 -0.692 0.4895
## countrySpain:log.GDP 0.77175 2.58403 0.299 0.7654
## countrySweden:log.GDP -14.21898 7.43586 -1.912 0.0568 .
## countrySwitzerland:log.GDP -1.09308 4.97312 -0.220 0.8262
## countryFinland:log.GDP -3.61590 2.42630 -1.490 0.1372
## countryGermany:log.GDP 1.63310 2.71819 0.601 0.5484
## countryAustria:log.GDP -1.63186 4.35215 -0.375 0.7080
## countrySlovenia:log.GDP -4.77991 10.02886 -0.477 0.6340
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.704 on 304 degrees of freedom
## Multiple R-squared: 0.2442, Adjusted R-squared: 0.1871
## F-statistic: 4.272 on 23 and 304 DF, p-value: 1.546e-09
Given the results of the ANOVA table produced below, we simplify our model to exclude PctYouth, log(GDP), and the interaction between reference country and log(GDP). Results from the ANOVA table suggest that there is a significant interaction effect between reference country and PctYouth (F=2.14, df=7, p<0.05), which partially supports our hypothesis that the effect of reference country on methamphetamine usage is a function of PctYouth and log(GDP).
#Recreate model without GDP, age, and interaction terms
methmod2 <- lm(data = methplot, log.Weekday.mean ~ country + country:PctYouth)
#Generate ANOVA table
anova(methmod2)## Analysis of Variance Table
##
## Response: log.Weekday.mean
## Df Sum Sq Mean Sq F value Pr(>F)
## country 7 35.208 5.0298 10.0853 2.237e-11 ***
## country:PctYouth 8 8.555 1.0694 2.1442 0.03162 *
## Residuals 312 155.601 0.4987
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Generate summary
summary(methmod2)##
## Call:
## lm(formula = log.Weekday.mean ~ country + country:PctYouth, data = methplot)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.4471 -0.3624 -0.1889 0.2485 2.3507
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.6945 4.4119 1.744 0.08214 .
## countrySpain -3.4146 6.8726 -0.497 0.61965
## countrySweden -6.5247 4.7804 -1.365 0.17327
## countrySwitzerland 2.5015 5.2129 0.480 0.63166
## countryFinland -2.9377 5.5053 -0.534 0.59398
## countryGermany 20.4933 10.0167 2.046 0.04160 *
## countryAustria -2.1745 7.6777 -0.283 0.77720
## countrySlovenia 15.4100 30.0315 0.513 0.60823
## countryBelgium:PctYouth -0.4336 0.3788 -1.145 0.25321
## countrySpain:PctYouth -0.1688 0.5433 -0.311 0.75621
## countrySweden:PctYouth 0.1250 0.1568 0.797 0.42580
## countrySwitzerland:PctYouth -0.6298 0.2458 -2.562 0.01088 *
## countryFinland:PctYouth -0.1542 0.2844 -0.542 0.58807
## countryGermany:PctYouth -2.3498 0.8554 -2.747 0.00636 **
## countryAustria:PctYouth -0.2755 0.5748 -0.479 0.63206
## countrySlovenia:PctYouth -2.2461 3.2481 -0.692 0.48976
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.7062 on 312 degrees of freedom
## Multiple R-squared: 0.2195, Adjusted R-squared: 0.182
## F-statistic: 5.85 on 15 and 312 DF, p-value: 1.111e-10
In Figure 6.4h, we replot our methamphetamine model predicted values by PctYouth to visualize the interaction.
#Replot with age and country in same plot
#Make new x values
new.x5 <- expand.grid(
country = unique(methplot$country),
PctYouth = seq(from=min(methplot$PctYouth), to=max(methplot$PctYouth) +1),
log.GDP = mean(methplot$log.GDP))
#New y values predicted from the model
new.y5 <- predict(methmod2, newdata = new.x5, interval = "confidence")
#Add the new x and y to a data frame called addThese5
addThese5 <- data.frame(new.x5, new.y5)
#Plot age vs cocaine consumption including countries
ggplot(data = addThese5, aes(x = PctYouth, y = fit, group = country, color = country)) +
geom_point() +
geom_smooth(method="lm") +
geom_errorbar(data = addThese5, aes(ymin = lwr, ymax = upr), size = 0.2, stat = "identity") +
xlab("Percent Youth (Ages 15-24)") +
ylab("Mean Methamphetamine Usage
(log mg/1000 people/day)") +
ggtitle("Figure 6.4h") +
theme_classic() +
theme(plot.title = element_text(hjust = 0.5)) + # center title
theme(plot.title = element_text(face = "bold")) + # bold title
theme(text = element_text(size=14))Bannwarth, A., Morelato, M., Benaglia, L., Been, F., Esseiva, P., Delemont, O. and Roux, C. (2019). The use of wastewater analysis in forensic intelligence: drug consumption comparison between Sydney and different European cities. Forensic Sci. Res. 4, 141.
Beckerman, A. P., Childs, D. Z., & Petchey, O. L. (2017). Getting Started with R: An Introduction for Biologists 2nd Edition. Oxford, UK: Oxford University Press.
Betancourt W. Q., Schmitz B. W., Innes, G. K., Prasek, S. M., & Brown, P. (2021). COVID-19 containment on a college campus via wastewater-based epidemiology, targeted clinical testing and an intervention. Science of The Total Environment, 779, 0048-9697
Chen K., Kandel D. B., Davies M. (1997). Relationships between frequency and quantity of marijuana use and last year proxy dependence among adolescents and adults in the United States. Drug Alcohol Depend, 46, 53–67
Compton, W. M., Cottler, L. B., Phelps, D. L., Abdallah, B. A, & Spitznagel, E. L. (2000). Psychiatric disorders among drug dependent subjects: are they primary or secondary? Am J Addict, 9, 126–134
Concheiro, M., De Castro, A., Quintela, O., Cruz, A. and Rivadulla, M. L. (2007) Determination of Illicit Drugs and their Metabolites in Human Urine by Liquid Chromatography Tandem Mass Spectrometry Including Relative Ion Intensity Criterion. J. Anal. Toxicol. 31,.
Lipari, R. N. and Park-Lee, E. (2019). Key Substance Use and Mental Health Indicators in the United States: Results from the 2018 National Survey on Drug Use and Health.
Metcalf, T. G., Melnick, J. L. and Estes, M. K. (1995). Environmental virology: from detection of virus in sewage and water by isolation to identification by molecular biology–a trip of over 50 years. Annu. Rev. Microbiol. 49, 461–487.
Moshe, S. (2011). GDP as a Measure of Economic Welfare. ICER Working Paper, Retrieved from https://ssrn.com/abstract=1808685
Nieves, C., Cadet -James, Y., Mccalman, J. and Haswell, M. SOCIAL DETERMINANTS OF DRUG USE What works? A review of act ions addressing the social and economic determinants of Indigenous he… Katy Osborne A literature review for Indigenous men’s groups.
Russell, J. M., Newman, S. C., & Bland, R. C. (1994). Epidemiology of psychiatric disorders in Edmonton: Drug abuse and dependence. Acta Psychiatr Scand Suppl, 376, 54–62
Wastewater analysis and drugs — a European multi-city study | www.emcdda.europa.eu.
Zuccato, E., Chiabrando, C., Castiglioni, S., Bagnati, R. and Fanelli, R. (2008). Estimating community drug abuse by wastewater analysis. Environ. Health Perspect. 116, 1027–1032.
Ethical Considerations
The EMCDDA advises several considerations for using the data from their study (https://score-cost.eu/wp-content/uploads/sites/118/2016/11/WBE-ethical-guidelines-FINAL-March-2016-.pdf); first, in order for a country to be adequately represented by its cities, the sample size must be 10% of the total population or contain measurements from at least five locations. Next, studies should take care not to mischaracterize certain demographics, particularly marginalized communities. Additionally, partnerships between researchers and firms who wish to act on findings should be chosen responsibly. Collaborations necessitate mutual vetting between parties to ensure usage of research findings aligns with the EMCDDA’s mission of ethical practice.
In acknowledgement of these considerations, we have sampled seven cities from each country and applied these data to examine relationships among the broad demographic factors of age and GDP. These countries collectively represent more than %15 of Europe’s total population. These factors are not inclined to reveal any controversial relationships between the population and its drug use. Instead, this investigation aims to service communities in general by reducing uncertainty in the pursuit of optimizing public health. Moreover, our study has no external partnerships and communication of findings will retain academic and ethical integrity.
Project Metadata
Drug metabolite data
Intent of data:
The intent of this data set was to determine the level of use of the drugs cocaine, cannabis, methamphetamines, amphetamines, and MDMA across Europe.
Data type:
This dataset takes the form of categorical data.
Data acquisition:
This data was accessed through the European Monitoring Centre for Drugs and Drug addiction website: (https://www.emcdda.europa.eu/publications/html/pods/waste-water-analysis_en#siteInfoTable)
Study site(s):
The study sites in this data set included wastewater treatment plants from 137 cities across 29 European countries.
Note: cities followed by a number in parantheses indicate where an average was found for multiple waste-water treatment plants
Countries and cities:
Austria: Graz, Hall-Wattens, Innsbruck, Kapfenburg, Klosterneuburg, Kufstein, Murzzuschlag, Purgstall, Region Millstattersee, Strass im Zillertal, Wasserverbard Hofsteig
Bosnia: Sarajevo
Belgium: Antwerp Deurne, Antwerp Zuid, Boom, Brugge, Brussels, Geraardsbergen, Koksijde, Merchtem, Ninove, Oostende, Ruisbroek Pruus
Croatia: Zagreb
Cyprus: Limassol, Nicosia
Czech Republic: Brno, Budweis, Karlovy Vary, Ostrava, Prague
Denmark: Copenhagen
Finland: Espoo, Helsinki, Hameenlinna, Joensuu, Jyvaskyla, Kajaani, Kemi, Kokkola, Kotka, Kouvola, Kuopio, Lahti, Lappeenranta, Mariehamn, Mikkeli, Oulu, Pietarsaari, Pori, Rauma, Rovaniemi, Salo, Savonlinna, Seinajoki, Tampere, Turku, Vihti
France: Bordeux I, Fort-de-France P, Paris Seine Centre, Paris South Suburbs
Germany: Berlin (4), Chemnitz, Dortmund, Dresdend, Dulmen, Erfurt, Frankfurt (3), Hamburg (2), Hannover (2), Magdeburg, Mainz, Munich Gut_Grosslapen, Nuremburg, Rostock, Saarbrücken (2), Stuttgart
Great Britain: Bristol, London
Greece: Athens, Eleusis, Mytilene, Thessaloniki
Iceland: Reykjavik Klettagardar
Italy: Bozen, Milan
Lithuania: Kaunas, Klaipeda, Vilnius
Latvia: Riga
Malta: Malta
Netherlands: Amsterdam, Eindhoven, Neiuwegein Ijssel, Oudewater, Utrecht
Norway: Oslo
Poland: Krakow P
Portugal: Almada, Lisbon, Porto
Romania: Cluj Napoca
Serbia: Belgrade, Novi Sad
Spain: Barcelona, Castellon, Madrid VdlV, Molina de Segura, Santiago, Valencia (3)
Sweden: Gothenburg, Gavle, Sandviken, Stockholm (2), Soderhamn, Umea, Uppsala
Switzerland: Basel, Berne, Geneva, Lausanne, Lugano, St. Gallen Hofen, Zurich
Slovenia: Domzale-Kamnik, Koper, Ljubljana, Maribor, Novo mesto, Velenje
Slovakia: Bratislava (2), Piestany
Turkey: Adana, Istanbul I, Istanbul (II-VII)
Dates of data collection:
The data was collected over a period of 9 years (from 2011-2020).
Data collection methods:
Samples were collected daily over the above specified time period. An automated sampling device was used to collect 24 composite samples for each hour of the day. Samples were collected between 8am-10am the following day and combined to create a representative 24-h composite sample. The concentrations of target residues (in nanograms/litre) were obtained through off and on-line solid phase extraction with polymeric cartridges, and liquid chromatography coupled to mass spectrometry. The drug use for each city was then estimated by multiplying the concentration of each target drug reside with the corresponding flow of sewage (litre/day) and a correction factor (obtained by considering the average excretion rate of a given drug residue and the molecular mass ratio of the parent drug/metabolite) and then dividing by the population served by the wastewater treatment plant. The resulting number represents the amount of substance consumed per day per 1000 people. Averages of substance consumption were found for the entire week, all weekdays, and the weekend for every year of the study.
Ethical Statement:
Guidelines outlining the main ethical risks for wastewater research and strategies on how to mitigate these risks can be found here: https://score-cost.eu/wp-content/uploads/sites/118/2016/11/WBE-ethical-guidelines-FINAL-March-2016-.pdf.
Preparing the data:
For the purpose of our study, only Countries for which drug metabolite data was available in 6 or more cities were included. This encompossed 8 countries (Austria, Belgium, Finland, Germany, Sweden, Switzerland, Spain, and Slovenia) and 90 cities. Additionally, the time period was narrowed down to 2011-2018. To ensure more even comparisons, this study followed a blocking design where 5 cities were randomly selected from each of the 8 remaining countries. To randomly select 5 cities from each country, first a sampling frame was constructed in excel where each of the 90 cities were numbered and paired with their corresponding country. This sampling frame was input into R studio (with the read.csv() function) in order to create a series of subsamples for each country. To create the subsamples, the function sample() was used with the arguments x (the range of numbers of the cities in the country of interest), size (how many cities should be in each subsample), and replace (can the same city be selected twice). To get the total sample population for our study, we then used the rbind() function to join together the newly created subsamples. The data frame containing the total sample population was then exported as a .csv file using the write.csv() function so that it could be combined with the drug metabolite data for each randomly selected city and used in downstream analysis. For ease of analysis, only the weekday mean of drug use was considered.
GDP Data:
Intent of the data:
To determine the gross domestic product (GDP) of 266 countries/regions around the world.
GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources.
Please see the World Bank website for more information on the limitations of using GDP in data analysis.
Data type:
This dataset takes the form of categorical data.
Data acquisition:
The raw data set was accessed via The World Bank website (https://data.worldbank.org/indicator/NY.GDP.MKTP.CD)
Study site(s):
The study sites in this data set are comprised of 266 different countries/regions around the world. Please see https://databank.worldbank.org/reports.aspx?source=2&series=NY.GDP.MKTP.CD&country= for a complete list.
Dates of data collection:
The data was collected yearly for a period of 80 years (from 1960-2020).
Data collection methods:
This dataset was based on World Bank national accounts data and OECD National Accounts data files. Data collection methods include: population censuses, household surveys, national accounts, purchasing power parity surveys, income surveys, and agricultural censuses.
Country data is collected in the local currency units, and then converted to the constant price of the base year as determined by each country. For GDP, these values are then converted into constant 2010 USD with gap filling of missing values. Missing data are imputed based on the relationship of the sum of available data to the total in the year of the previous estimate.
Country Specifics
Austria: Latest population census conducted in 2011, 1 (1999) euro = 13.7603 Austrian schilling, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Belgium: Latest population census conducted in 2011, 1 (1999) euro = 40.3399 Belgian franc, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Finland: Latest population census conducted in 2011, 1 (1999) euro = 5.94573 Finnish markkka, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Germany: Latest population census conducted in 2011, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Sweden: Latest population census conducted in 2011, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Swizterland: Latest population census conducted in 2011, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2010.
Spain: Latest population census conducted in 2011, 1 (1999) euro = 166.386 Spanish peseta, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2009.
Slovenia: Latest population census conducted in 2018, 1 (1999) euro = 239.64 Slovenian tolar, 2015 = the National Accounts reference year, latest income survey conducted in 2015, latest agricultural census conducted in 2009, latest household survey conducted in 2003.
Preparing the data:
For accurate comparison, the original data set was narrowed down to only include the countries and years considered by our drug metabolite data.
Age data
Intent of data:
The intent of this dataset was to document the percentage of the population by age group for 235 countries across the world.
Data type:
This data set takes the form of categorical data.
Data acquisition:
This data set was accessed from the United Nations Department of Economic and Social Affair’s website (https://population.un.org/wpp/Download/Standard/Interpolated/).
Study site(s):
The study sites within this data set encompass 235 countries from across the world. Please refer to (https://population.un.org/wpp/Download/Metadata/Documentation/) for a full list of the countries included.
Dates of data collection:
The data was collected annually for a period of 70 years (from 1950-2020).
Data collection methods:
The data collection methods were comprised of a mix of census data, population registers, and official estimates.
Country Specifics
Austria: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1951, 1961, 1971, 1981, 1991, 2001, 2011 censuses; population register through 2017; official estimates through 2017.
Belgium: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1961, 1970, 1981, 1991, 2001, 2011 censuses; population register through 2017; official estimates through 2017.
Finland: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1970, 1975, 1985, 1990, 2000, 2010 censuses; population register through 2017; official estimates through 2017.
Germany: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1987, 2001, 2011 censuses; population register through 2018; official estimates through 2018.
Sweden: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1965, 1970, 1975, 1980, 1985, 1990, 2011 censuses; population register through 2018; official estimates through 2018.
Swizterland: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1970, 1980, 1990, 2000, 2010 censuses; official estimates through 2017.
Spain: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1950, 1960, 1970, 1981, 1991, 2001, 2011 censuses; official estimates through 2017.
Slovenia: Total population and distribution by age and sex estimated to be consistent with the population by age and sex of the 1953, 1961, 1971, 1981, 1991, 2002, 2011, 2015 censuses; population register through 2017; official estimates through 2017.
Preparing the data:
For accurate comparison, the original data set was narrowed down to only include the countries and years considered by our drug metabolite and GDP data. For ease of analysis, only information on the percentage of the population within the age group 15-24 (for each Country) was extracted from the original data set. A separate file with the above changes was created in excel for further analysis in R studio.
Codebook:
Units
Drug metabolites (amphetamines, methamphetamines, cocaine, MDMA): mg/1000 people/day
GDP: 2010 USD
Population: Percentage of total population that is 15-24
Extra tools/software:
Github, excel, R Studio
Wastewater Based Epidemiology (WBE)
Wastewater based epidemiology (WBE) is a chemical analysis technique used to detect a myriad of compounds for regions serviced by wastewater treatment facilities. The functions of this technique range from monitoring environmental impact of household liquid waste to the very topical application of detecting the presence of Covid-19 (Glassmeyer et al., 2005), (Bentacourt et al., 2021).
Wastewater data account for response bias by indiscriminately collecting wastewater from a given locality. It is broad in the sense that only the averages for a population can be gleaned and will not reveal the true distribution of drug use for individuals within in a population. However, given the scale, consistency, and reliability of the method, these wastewater data can be paired with other demographic data for multiple locations to tease out potential factors affecting drug consumption (Castiglioni et al., 2014).
In order to estimate levels of drug use from wastewater, researchers first identify and quantify drug residues. Then, concentration is determined by measuring the daily flow rate for the location the sample was taken. Next, a correction factor is applied to each drug to account for changes in structure and quantity after metabolism. The back-calculated total quantity of consumption is then divided by the population for the area serviced by the treatment facility. And finally, an average is produced for each drug consumed. This method is not novel but has never before been applied on such a large geographic and temporal scale. Indeed, this represents a large scale coordinated effort to quantify drug use patterns for critical populations. The consistency exercised throughout the study lends credence to the undertaking, making it perfectly adaptable to further investigations. This dataset therefore served as the primary scaffolding for our own investigation. With so many geographic domains represented through time, we were interested in determining if these wastewater data could be used with other known variables to explore potential relationships between drug use and demographic characteristics of a population.
The countries from this set that we chose to include in this study were; Spain, Sweden, Switzerland, Belgium, Finland, Austria, Germany, and Slovenia.
Project Packages
| Package | Description | Link |
|---|---|---|
| dplyr | Data tidying package | https://cran.r-project.org/web/packages/dplyr/index.html |
| ggfortify | Data visualization package | https://cran.r-project.org/web/packages/ggfortify/index.html |
| ggplot2 | Data visualization package | https://cran.r-project.org/web/packages/ggplot2/index.html |
| lattice | Data visualization package | https://cran.r-project.org/web/packages/lattice/index.html |
| lubridate | Data tidying package | https://cran.r-project.org/web/packages/lubridate/index.html |
| plotrix | Data visualization package | https://cran.r-project.org/web/packages/plotrix/index.html |
| readr | Data tidying package | https://cran.r-project.org/web/packages/readr/index.html |
| stringr | Data tidying package | https://cran.r-project.org/web/packages/stringr/index.html |
| tidyr | Data tidying package | https://cran.r-project.org/web/packages/tidyr/index.html |